New working paper. Title above, links, abstract, TOC, and introduction below.
Links:
Academia.edu: https://www.academia.edu/169390494/Notes_on_the_Collective_Valuation_of_Thick_Objects_Financial_Assets_Movies_and_Novels
ResearchGate: https://www.researchgate.net/publication/408219138_Notes_on_the_Collective_Valuation_of_Thick_Objects_Financial_Assets_Movies_and_Novels
Abstract: Machine learning is creating a methodological bridge between disciplines that previously seemed far apart, especially economics and literary criticism. The bridge is the analysis of how populations deal with “thick objects.” A thick object is not exhausted by a few visible traits. It gathers interpretation, expectation, memory, value, narrative, and social response. A toaster is usually a thin object. A firm that manufactures toasters is thick: it has assets, debt, brands, patents, management, supply chains, analyst coverage, market expectations, and future promises. Scott Galloway’s remark that stocks are like brands — part promise, part performance — links stock, movies and novels. Each is a thick object moving through a field of collective judgment. Its value reflects both measurable performance and imagined future promise. They are thus as neighboring cases in a general problem: how populations perceive, classify, value, and transform thick objects. Machine learning constructs object-spaces from the traces minds leave behind. The task now is to learn how to interpret those spaces without mistaking the model for the world.
High-dimensional asset-pricing models start with many stock characteristics — price, returns, volume, profitability, leverage, liquidity, analyst revisions, momentum, volatility, investment, and so on. These characteristics are traces of firm activity, accounting conventions, analyst judgment, and trader behavior. New models then generate hundreds of thousands of nonlinear transformations from those characteristics in order to approximate the market’s pricing kernel, the structure through which future payoffs are priced under uncertainty. The individual factors are analytic objects approximating the valuation geometry produced by collective market activity.
That sounds strange in economics, but it is familiar from Matthew Jockers’ work on nineteenth-century Anglophone novels. Jockers created a high-dimensional design space from thousands of novels, using stylistic features and topic models. His topics are not literal thoughts in anyone’s mind. They are model-derived approximations to recurrent regions of culturally circulating thought. Yet the model revealed historical direction: novels arranged by similarity formed a temporal diagonal, a computationally disciplined proxy for population-level cultural cognition.
Arthur De Vany’s model of Hollywood adds the dynamic bridge. Movies are thick expressive-market objects. Their success cannot be predicted simply from stars, director, budget, genre, or advertising. Once released, they enter an audience field where word of mouth, imitation, and nonlinear cascades determine their fate. Most fail, some profit, a few become blockbusters. The dynamics are heavy-tailed, interactive, and collective.
Contents
Introduction: Using ChatGPT for focused intellectual exploration across disciplines 3
Thick Objects: Ground Shared by Economics and Cultural Analysis [Summary] 9
AIPT, Large Factor Models [First Session] 17
Hollywood Economics 23
Macroanalysis 27
The emerging triad 30
Direction over time 31
Doing a Jockers style analysis for financial assets 38
Thinking about thick objects 40
Stocks are like brands [Session Two] 42
Algorithmic and Causal models [Session Three] 52
Those empirical APT models [Session Four] 56
Decision space 63
A bridge between disparate disciplines 67
Introduction: Using ChatGPT for focused intellectual exploration across disciplines
This document serves two purposes. It presents a specific argument leading to the following provisional formulation:
High-dimensional models of novels, movies, and assets disclose the population-level geometry of collective interpretation around thick objects, turning literary criticism and economics into neighboring sciences of modeled valuation.
How I arrived at the speculation, however, is as important as the idea itself, perhaps more so. I did not arrive at that idea unaided. ChatGPT helped me. Those aren’t my words; they’re ChatGPT’s. I know a great deal about literary criticism and about movies, but not much about economics. I need ChatGPT to bridge the conceptual distance between the humanities, literary criticism, and the social sciences, economics.
Methodological curiosity
Fortunately the peculiar circumstances of my career have forced me to be interested in method and epistemology: How is it that we can come to know about the world and what methods can we use to arrive at that knowledge? When I entered Johns Hopkins as a freshman in 1965 the discipline of literary criticism was in a state of crisis, though I didn’t know that. How could I? I’d only just graduated high school and I still pretty much knowledge as it was handed to me.
That soon changed. The details of just how, when, and why don’t matter much at the moment. That it happened is sufficient for my present purposes. The upshot is that I became interested in Coleridge’s “Kubla Khan” in my senior year. I investigated the poem with standard interpretive methods augmented by avant garde structuralism and found patterns I could not explain. But they “smelled” of the nested loops I learned about in a course in computer programming.
That sent me to the English Department at SUNY Buffalo, which had the best experimental program in the nation. I found a fellow graduate student, Ralph Henry Reese, who pointed me around a corner and down the hall to David Hays in Linguistics. Hays had been a first generation researcher in machine translation at the RAND Corp. and, as such, was one of the founders of computational linguistics. While I wasn’t able to resolve my issues with “Kubla Khan” – they’re still hanging fire – I became hooked on cognitive science. Consequently my dissertation in the English Department was also a quasi-technical exercise in knowledge representation, the discipline within cognitive science and artificial intelligence about the representation of human knowledge in computable form.
Given that that is where I had arrived in the late 1970s it is perhaps not so strange that now, decades later, I find myself staring down some pretty formidable economics despite never having studied the subject. For the last 15 years, however, I have been reading the Marginal Revolution blog hosted by Tyler Cowen and Alex Tabarrok and I have been reading my way through Cowen’s recent monograph, The Marginal Revolution: Rise and Decline, and the Pending AI Revolution (2026). Cowen’s theme in the fourth (and last) chapter is that the economics he was trained in, the economics which followed from the Marginal Revolution, is rapidly being eclipsed by a more determinedly empirical discipline based on machine learning.
Bombed by 360,000 factors
Here is Cowen’s premier example. It’s from something called Arbitrage Pricing Theory (APT) (pp. 99-100):
There is a recent working paper which is perhaps more striking yet, by Antoine Didisheim, Shikun (Barry) Ke, Bryan T. Kelly, and Semyon Malamud. They pick up from Arbitrage Pricing Theory (APT), a well-established idea from financial economics. APT typically looks for “factors” in the data which predict excess returns, and a traditional APT model might have found five or six such factors. Are “inflation” or perhaps “the term structure of interest rates” useful factors? Well, that can be debated, but if so, those results sound pretty intuitive. But those intuitions seem to be disappearing. In a paper by these authors, they apply machine learning methods to look for more factors. As we know, machine learning is very good at finding non-obvious relationships in the data. The largest model they built has 360,000 (!) factors, and it reduces pricing errors by 54.8 percent relative to the classic six-factor model from Fama and French. Bravo to the authors, but what kinds of intuitions do you think possibly can be supported by those 360,000 factors?
When I read that, it “looked like Greek to me,” as the cliché has it. But I took a deep breath and thought carefully, step by step and concluded that the assets in question are stocks. What you need to pay attention to is 1) the contrast between six factors and 360,000 factors, 2) the fact that one set of factors is intuitive while the other certainly is not, 3) but the unintelligible, unintuitive, collection of factors does a better job of pricing. That’s the new world toward which economics is moving. While the old intuitions are gasping for breath the new-fangled numbers are fit as a fiddle and ready for duty.
I thought some more and realized that what’s really going on is that people are evaluating those stocks, communicating with one another directly about them, and making decisions about buying and selling, thereby communicating indirectly with one another. That’s what those 360,000 factors are capturing, the actions of a dispersed community of analysts and traders. “Could this be roughly similar to the decisions movie-goers make about the movies they see based, not only on their preferences, but on information they get from reviews, and perhaps more importantly, from their friends?” “If so,” I conjectured, “then perhaps Cowen’s old colleague from Irvine, Arthur De Vany, can shed some light on the situation.” That is to say, can give me some intuitions that I can apply to the situation.
For De Vany had written a very interesting book, Hollywood Economics (2004), about the fate of movies once they have been released. Just as those intuitive “classical” models in economics aren’t as accurate as the new high-factor models, so you can’t predict the box-office performance of movies on such simple factors as the identities of the producer, screen writers, or stars in the movies. Now, De Vany didn’t produce a high-factor model that improved matters, he did something quite different (which is discussed below, pp. 23 ff.), but that’s secondary at the moment. The point is that we seem to have a gross similarity, the behavior of some object that interests a lot of people, a stock or a movie, cannot be reliably predicted using a simple model.
Meme stocks and novels
The similarity was reinforced when I heard a remark by Scott Galloway on the Pivot podcast: “Stocks are like brands and that is they’re part promise and part performance.” Consider the recent phenomenon of meme stocks, which Wikipedia glosses this way:
A meme stock is a stock that gains popularity among retail investors through social media. The popularity of meme stocks is generally based on internet memes shared among traders, on platforms such as Reddit's r/wallstreetbets. Investors in such stocks are often young and inexperienced investors. As a result of their popularity, meme stocks often trade at prices that are above their estimated value – as based on fundamental analysis – and are known for being extremely speculative and volatile.
Meme stocks are assets where promise overwhelms performance, more story than substance.
That’s what movies are. You are purchasing the story and the experience, not the seat in the theater, or the DVD, or the stream, those are the vehicles that carry the story. Claude calls these things “thick” objects (perhaps borrowing from the anthropological concept of “thick” description? ), as opposed to “thin” objects like toasters and drills. Novels are thick objects as well, which led me to Matthew Jockers’ 2013 book, Macroanalysis, where he uses machine learning to develop a high dimensional model (a mere 600 dimensions rather than 360,000) of a corpus of 3000 19th century Anglophone novels. Just as read De Vany’s book quite closely, so I’ve written a series of posts about Jockers’ book. I bring his model into the mix as well (pp. 27 ff.).
Thus I am now in a position to take two models in subjects I know well, movies and novels, and bring them to bear on contemporary machine learning in financial economics, a subject I do not know at all. And, for that matter, still don’t. But I’ve got some intuitions. And one of those intuitions led me to focus on the fact that, while Jockers’ model did not contain any dates, upon inspection it turned out to have a diagonal (p. 27) that is correlated with direction in time. Not only did 19th century novels change in theme and motif over time, there is a direction to that change. The system seems to exhibit directional evolution. And so I directed Claude to explore the possibility of that this might be a general characteristic of thick-objects being used by a large population of interested parties (pp. 31 ff). Here is the conjecture Claude arrived at (p. 35):
In thick-object domains, low-dimensional intuitive factors often fail to explain individual outcomes. But high-dimensional representation can reveal population-level structure: outcome basins in movies, pricing kernels in finance, and temporal direction in novels. The next step is to ask whether all such artifact systems exhibit historical vectors in feature space, generated by a generational ratchet in which each cohort of producers is shaped by the artifact ecology inherited from its predecessors.
Notice the territory we have traversed in conceptual space. We started with an undergraduate at Johns Hopkins (me) using interpretive methods to study a poem, “Kubla Khan.” That investigation led to problems that forced me to study computational semantics in graduate school, a distinctly different mode of intellectual work, one based on formulating an elaborate system of structural rules. We then zipped through time and over intellectual space to a social scientist, Tyler Cowen, who was trained in the used of causal models to generate statistically controlled observations about economic behavior. He is now confronted with multifactor machine learning models with no intuitively discernible causal structure that nonetheless have superior predictive power. Cowen got me interested in one of those models and I, in turn, summoned Anthropic’s Claude to explain it to me.
The way I see, and I’ve seen it this way for a long time, the human sciences – more a European notion than American, les sciences humaines – can be arranged into three camps according to methodological focus: interpretive or hermeneutic (roughly, the humanities), causal modeling (roughly, the social sciences), and structural rules (roughly, the “classical” cognitive sciences). We’ve spanned them all in the course of this introduction. What will the future bring?
Bonus: I leave it as an exercise for the reader to consider the relevance of Keynes’s talk of “animal spirits” and to incorporate Robert Shiller’s narrative economics into this picture.
What’s in this document
The rest of this document is devoted to the dialogs where I used ChatGPT to work through the connections between these three models, two I knew quite well (De Vany on movies and Jockers on novels), and one I did not (Didisheim et al. on asset pricing). Claude knows them all, for some non-trivial meaning of “know,” and many others as well. The purpose of the dialog, then, is to link something I do not know to something that I do. The dialog took place in four sessions over the course of a week from the end of May into June.
Rather than comment on each of the sections listed in the outline, with one exception, I am commenting only on the sections that mark the beginning of a new session with ChatGPT. For what it’s worth, they mark how the subject evolved in my mind. The one exception? The summary was the last thing ChatGPT did, obviously, but I moved it to first place.
Thick Objects and the New Common Ground of Economics and Cultural Analysis [Summary] – I had ChatGPT prepare this summary and the very end of the process, on June 22. I put if first in case some might want to get the gist of the exercise without slogging through the details.
AIPT, Large Factor Models [First Session] – There is where I began on May 26. I started by asking ChatGPT to explain asset pricing to me. Once I had some sense of that, I then went on to the models I was familiar with, first De Vany on movies and the Jockers on 19th century Anglophone novels.
Stocks are like brands [Session Two] – I initiated this session on May 30 when I heard Galloway’s remark about stocks being like brands. That crystalized things for me so I needed to work back through the analysis. In the course of that discussion I focused on the concept of a brand as a distinct conceptual objects and ChatGPT’s response clarified the role of marginalism in clearing the way for asset models with a very large number of factors.
Algorithmic and Causal models [Session Three] – I don’t recall whether anything in particular prompted me to initiate this dialog. Perhaps mere methodological curiosity. This took place on June 2.
Those empirical APT models [Session Four] – It’s not entirely clear to me just whether anything in particular prompted this session. But what I was thinking was that, while I’m familiar with novels and movies and the academic discourse about them, asset pricing is unfamiliar territory. So I wanted to nail down as well as I could just what “ground truth” is in this area. Movies start with eyeballs in theaters and novels start with eyeballs scanning pages, where does asset pricing start? Once ChatGPT had gone through this I realized that I’d seen it earlier in the whole process. Still, I was happy to go through it again, this time coming at it after having thought about it. It’s as the end of this session that I asked ChatGPT to summarize the discussion.