Tuesday, February 25, 2025

Search as an interface between informatics and economics: How is the distribution of good ideas like the distribution of gold deposits?

The Answer: Both distributions are highly irregular. What that means in the case of gold deposits is pretty obvious: Gold ore is a physical substance that is found in the earth, a huge mass of physical substance. But there is no obvious order to just where you can find deposits of ore. So you have to go looking for them, which is called prospecting.

Ideas, though, are not things. What does it mean to talk about their distribution? Is there some kind of abstract space where ideas exist? If so, how do we map and describe that space.

So, first I’m going to talk a bit about locating ideas in space. Then I’m going to present a conversation I had with Claude 3.5. First I talk about locating ideas in some abstract space, then I present the conversation I had with Claude, which starts with gold and ends with AI.

Ideas in space

Well, think of a library. Libraries contain books and books contain ideas. Books are physical objects and so have locations in physical space, library shelves. So, how are books placed on those shelves? Off hand, there seems to be two principles: 1.) alphabetically by author name, and 2.) according to subject matter. Fiction tends to be organized according to the first principle while non-fiction is organized by the second. This means that novels placed on the same shelf might are likely to be very different in character. (Take a look at this alphabetized list in Wikipedia.) Non-fiction is arranged by subject matter, so books that occupy the same self will be generally about the same thing. But there is a limit to how far that principle takes us. For one thing, any given book can be about many different things. It can’t be on selves with all of them. Still, you get the idea. We have ideas arranged in space, albeit a space limited to three dimensions.

Things get more interesting when we have a collection of electronic documents. In this case, actual physical space is irrelevant. What matters is, well, idea space. Back in the 1960s and 1970s Gerard Salton came up with the idea of representing documents using vectors containing terms in the documents. Thus, each document will occupy a position in some high-dimensional space. Documents located close together in that space will be about similar topics. Salton’s Vector Space Model is one of the foundations of machine learning – something I discuss with Claude 3.5 in this post, Memory in machines and minds, a conversation with Claude 3.5 Sonnet.

OK, so that gets us a way of thinking about ideas as existing in an abstract space. But how do we motivate the idea that some ideas are closer to us than other ideas? Well, imagine that you occupy some position in idea space, a high dimensional space. There are a bunch of ideas you understand very well. We can say they are close to you. Other ideas are utterly unintelligible. By convention we will say that they are far from you. In order to reach one of those distance ideas you are going to have to learn the ideas that are on a path between where you are now and where that idea is.

Yeah, I know, it’s fuzzy and indistinct. But you get the idea. I explore that notion in these working papers: Stagnation and Beyond: Economic growth and the cost of knowledge in a complex world, and, What economic growth and statistical semantics tell us about the structure of the world. The second one is shorter and more directly to the point.

It seems pretty clear to me that the further development of machine learning and A.I. is going to teach us quite a bit about the structure of idea space. Every LLM is a high-dimensional space jam-packed with billions and billions of ideas. We’ve just barely begun to map that space, something I talk about in my recent report, ChatGPT: Exploring the Digital Wilderness, Findings and Prospects, see especially pp. 34-38, 42-44.

* * * * *

That’s enough of that. The rest of this post consists of a dialog I had with Claude 3.5 about the relationship between technology and the search for irregularly distributed things. We start with gold, move on to oil and diamonds, then documents, drug discovery, movies and finally new AI technology. As always, my prompts are bold face while Claude’s replies are plain face.

Gold

How have gold prospecting methods changed from the 19th century through the 20th century?

Let me walk you through the key changes in gold prospecting methods across these two centuries.

In the 19th century, prospecting was largely manual and relied on simple tools and techniques:

Panning was the most iconic method - prospectors would use flat pans to swirl river sediment, letting water wash away lighter materials while heavier gold settled at the bottom. This was supplemented by sluice boxes, which were long wooden channels that used flowing water and riffles to separate gold from sediment on a larger scale.

Placer mining dominated early operations, focusing on loose gold deposits in streams and riverbanks. As surface deposits became depleted, hydraulic mining emerged - using high-pressure water cannons to blast away hillsides, though this was later restricted due to environmental damage.

The 20th century brought significant technological advances:

Early in the century, dredging became widespread - floating mechanical dredges could process huge volumes of river sediment. Hardrock mining expanded with better drilling and crushing equipment, allowing miners to extract gold from solid rock formations more efficiently.

Later developments included:

- Geochemical prospecting using sophisticated analysis of soil and rock samples
- Aerial surveys and remote sensing to identify promising deposits
- More precise drilling techniques for sampling
- Advanced recovery methods like heap leaching and carbon-in-pulp processing
- Mercury-free extraction techniques addressing environmental concerns

Modern prospecting now combines traditional methods with high-tech approaches like:

- GPS mapping and GIS software for precise location data
- Ground-penetrating radar to detect deposits
- Portable XRF analyzers for rapid mineral identification
- Environmental monitoring systems
- Computer modeling of deposits

This progression shows how gold prospecting evolved from relatively simple manual techniques to a sophisticated industry incorporating advanced technology while becoming more environmentally conscious.

OK, so we have roughly three technological eras: 1) manual, 2) chemical sample analysis and aerial and remote sensing, and 3) high-tech computer-based methods for detection and mapping.

Yes, that's a good way to break it down! Let me build on that framework:

Era 1 (Manual) - roughly 1800s to early 1900s:

Primary methods were direct physical observation and mechanical separation. Prospectors relied on their understanding of geology, visible gold indicators, and mechanical processes like panning, sluicing, and early dredging. Skills were often passed down through experience and apprenticeship.

Era 2 (Chemical/Remote) - roughly 1920s to 1970s:

Brought systematic scientific analysis into prospecting. Geochemical sampling could detect trace elements indicating gold deposits. Aerial photography and later satellite imagery allowed prospectors to identify promising geological formations from above. This era marked the shift from prospecting as an art to a more systematic science.

Era 3 (Digital) - 1980s onward:

Computerization transformed the field. GPS, digital mapping, and data analysis tools allowed prospectors to integrate multiple data sources and create detailed geological models. Ground-penetrating radar and portable analyzers provided immediate feedback in the field. Environmental monitoring became increasingly sophisticated and integrated into prospecting practices.

What's particularly interesting is how these eras overlap and build on each other - modern prospectors often combine techniques from all three eras, using traditional knowledge alongside the latest technology.

Carry-over skills

What kinds of skills carried over from one era to the next? For example, what did highly skilled 19th century prospectors know that remained valuable in the second era? And then from second to third.

This is a fascinating question about the evolution of expertise! Let me trace these skill transfers:

From Era 1 to Era 2:

- Understanding of geological formations and how they relate to gold deposits remained crucial. Early prospectors' knowledge of where gold typically accumulated (like placer deposits in river bends or specific rock formations) helped inform where to conduct chemical sampling and aerial surveys
- Reading the landscape for signs of mineralization - knowledge of indicator minerals, rock types, and geological structures remained valuable even when supplemented by chemical analysis
- Water flow dynamics understanding - experience with how gold moved and concentrated in waterways helped inform sampling strategies
- Assessment of ore quality - the ability to visually evaluate mineral-bearing rock remained important even as chemical assays became available

From Era 2 to Era 3:

- Interpretation of indirect evidence - the skill of making conclusions from multiple indicators (developed during the chemical/remote sensing era) became even more valuable when working with computer models and multiple data streams
- Sampling methodology - understanding of how to collect representative samples remained crucial for computer analysis - Pattern recognition in geological features - this skill trans
ferred well to interpreting digital mapping and radar data
- Knowledge of seasonal and weather effects on prospecting - this remained important for planning surveys and interpreting remote sensing data

What's particularly interesting is how these generations of prospectors had to learn to integrate their traditional knowledge with new technologies - for example, using their understanding of geology to know where to deploy expensive new sensing equipment most effectively.

The core skill that seems to have remained valuable across all eras is the ability to think in three dimensions about underground structures and visualize how geological processes might have created and moved gold deposits. This spatial thinking was valuable whether using a gold pan, analyzing aerial photographs, or interpreting 3D computer models.

Search in general

OK, let me think a bit. In any kind of search activity, whether searching for gold, or oil, or, for that matter, searching a document collection, we can think of at least two steps: 1) getting a rough lay of the land, 2) fine-grained high-intensity search. You don’t want to search the whole geographical area with the most accurate and intensive methods. Those take a long time and are expensive. You first want to narrow down your search area (1), and then really investigate (2). And then we have 3-dimensional thinking.

I’m thinking that an old prospector who’s highly skilled in 19th century methods might well have better luck than a phase 2 or even a phase 3 prospector who knows all about the tools, but has no sense of the land and can’t think in three dimensions. Does that make sense?

Yes, this is a really insightful observation! You've identified something fundamental about search processes and expertise that transcends the specific technologies.

The skilled 19th century prospector had developed what we might call "search heuristics" - efficient ways to eliminate vast areas of terrain and zero in on promising locations. They could look at a landscape and quickly assess:

- Overall geological patterns suggesting mineralization
- Signs of ancient river channels where placer deposits might have formed
- How underground water flows might have concentrated deposits
- The way erosion patterns could reveal or hide promising formations

This "eliminating the obvious non-starters" is a crucial skill that technology can't fully replace. A prospector with sophisticated chemical analysis tools or computer models but poor search heuristics might waste enormous time and resources doing detailed analysis of unpromising areas.

Your point about 3D thinking is particularly important. Gold deposits are formed by complex geological processes over millions of years - understanding how these processes work in three dimensions helps predict where gold might be found. An experienced prospector who can visualize:

- How ancient waterways might have flowed
- How tectonic forces could have twisted and folded rock layers
- How mineral-bearing solutions might have moved through rock formations
- Where erosion might have exposed or buried deposits

...has an advantage over someone who can only think in terms of surface features and current conditions, regardless of their technological tools.

This reminds me of how in other fields, like medicine, experienced practitioners often develop accurate "gut feelings" about where to look for problems, which then guide their use of sophisticated diagnostic tools. The technology doesn't replace the intuition - it helps confirm or refine it.

Oil and diamonds

I suppose the same kind of analysis could be applied to searching for oil deposits or looking for diamonds.

Exactly! These core principles of efficient search apply remarkably well across different resource exploration domains. Let me illustrate:

For oil exploration:

- The broad search phase involves identifying sedimentary basins and major geological structures that could trap oil
- Experienced prospectors learn to recognize surface features that hint at underground structures favorable for oil accumulation
- The 3D thinking is crucial for understanding how ancient seas created organic deposits, how those deposits were buried and transformed into oil, and how geological forces created traps to hold it

For diamond exploration:

- Initial search focuses on identifying kimberlite pipes - the volcanic formations where diamonds typically form
- Prospectors learn to spot indicator minerals that often accompany diamonds
- 3D thinking helps visualize how these volcanic pipes formed and where they might have been eroded or buried

In both cases, just like with gold, you see the same pattern:

1. Use broad search strategies to eliminate unlikely areas
2. Apply geological knowledge and 3D thinking to identify promising sites
3. Only then deploy expensive, detailed analysis tools

An experienced prospector might look at a landscape and quickly rule out 90% of it based on fundamental geological principles, saving enormous time and resources compared to someone who relies solely on technology and tries to search everywhere equally.

I imagine the most successful modern prospectors are those who combine deep traditional geological understanding with modern tools, rather than relying exclusively on either.

Documents

What about searching a document collection?

Ah, this is a fascinating parallel! The same principles of efficient search really do apply to document collections:

Broad search phase (terrain assessment):

- Understanding the overall "landscape" of the document collection - its major topics, time periods, sources
- Recognizing which sections or categories are unlikely to contain what you're looking for
- Identifying promising "formations" where your target information tends to cluster

Fine-grained search:

- Once you've identified promising areas, applying more intensive search methods
- Using detailed keyword searches, reading closely, cross-referencing
- Following citation trails in particularly relevant documents

The "3D thinking" parallel here might be understanding how information and ideas flow and connect:

- How documents relate to each other across time
- How ideas branch and influence each other
- How different topic areas intersect
- How information might be buried in unexpected places due to historical categorization choices

Just like a gold prospector might waste time doing detailed analysis of unpromising terrain, someone searching documents could waste enormous effort doing detailed reading in sections unlikely to contain relevant information. An experienced researcher develops intuition about where information tends to "deposit" and how it moves through academic or organizational ecosystems.

A modern researcher with powerful search tools but no sense of how information typically organizes and flows might be less effective than an experienced librarian from the card catalog era who deeply understands their field's information architecture.

Drug discovery

What about identifying new drugs?

Yes, drug discovery follows very similar search principles! Let me map it out:

Broad search phase (terrain assessment):

- Understanding broad chemical space and eliminating obviously unpromising regions
- Identifying promising chemical "families" based on known drug-like properties
- Looking for structural similarities to existing successful drugs
- Using knowledge of biological pathways to focus on likely mechanisms

Fine-grained search:

- Detailed molecular modeling of promising candidates
- Laboratory synthesis and testing
- Optimization of lead compounds
- Rigorous safety and efficacy testing

The "3D thinking" here is crucial in multiple ways:

- Understanding how molecules' 3D structures will interact with target proteins
- Visualizing how changes to molecular structure will affect binding
- Predicting how drugs might be metabolized in the body
- Understanding how different biological pathways interact

An experienced medicinal chemist often develops strong intuition about which chemical structures are likely to make good drugs - what will be stable, bioavailable, safe, etc. This expertise helps guide where to deploy expensive computational tools and laboratory resources.

Just like gold prospecting, success often comes from combining:

- Deep understanding of the fundamentals (chemistry, biology)
- Pattern recognition from experience
- Efficient search strategies to eliminate unlikely candidates
- Strategic use of advanced tools (computational modeling, high-throughput screening)

The parallel to gold prospecting is quite striking - in both cases, you can't exhaustively search everything, so you need good heuristics to focus your efforts on promising areas.

Movies

OK, now let’s swing for the fences, as the saying goes. What about looking for good movie projects. Films are very expensive to make. You don’t want to invest, say, $150 million only to have it bomb at the box office. And yet that does happen. On the other hand, no one expected “Jaws” to become a big hit.

This is a fascinating application of our search framework to creative/business decisions! Let me break it down:

Broad search phase in film development:

- Understanding audience demographics and trends
- Identifying genres that historically perform well
- Looking for source material with built-in audiences (books, games, etc.)
- Assessing economic conditions and cultural zeitgeist
- Tracking which stars and directors consistently draw audiences

Fine-grained analysis:

- Detailed script development
- Budget analysis
- Market research and test screenings
- Cast chemistry tests
- Detailed production planning

The "3D thinking" here involves understanding how multiple factors interact:

- How story elements resonate with different audiences
- How timing affects success (release date, cultural moment)
- How different elements (cast, director, marketing, etc.) multiply each other's effects
- How external events might impact reception

But here's where it gets really interesting - films like "Jaws" or "Star Wars" that become massive unexpected hits often break the conventional wisdom. They're like finding a gold deposit where geological principles say there shouldn't be one. This suggests that our search heuristics for films might be less reliable than for physical processes like gold deposits or chemical interactions.

Why? Because:

- Cultural response is less predictable than physical processes
- Audience tastes can shift rapidly
- Creative synergies can produce unexpected magic
- The "chemistry" between audience and film can be mysterious
- Innovation often succeeds precisely by breaking established patterns

So while experienced producers develop instincts about what might work (like prospectors with terrain), the "terrain" of audience response is far more fluid and unpredictable than actual geological terrain. This might explain why even experienced studios still have expensive failures - they're prospecting in a landscape that keeps changing under their feet.

De Vany

Yes, there’s an economist, Arthur De Vany who wrote a good book on this process, “Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry.

That's a really apt reference! De Vany's work highlights why film is such a fascinating case study in search under uncertainty. The film industry has certain features that make it fundamentally different from physical prospecting:

1. No fixed probability distribution - unlike gold deposits, which follow geological principles, film success doesn't have a stable underlying distribution. Each film's success can change the landscape for future films.

2. Information cascades - the success of one film can trigger rapid shifts in audience behavior. A hit film might create a new genre or revive an old one. This would be like finding gold suddenly creating new gold deposits!

3. Non-linearity - small differences between similar films can lead to vastly different outcomes. Two romantic comedies with similar budgets, stars, and plots might have wildly different box office results.

This connects back to our earlier discussion about search expertise - while an experienced gold prospector can reliably use past patterns to predict future success, an experienced film producer faces a much more chaotic landscape. Their expertise might be more about managing uncertainty than reducing it.

The parallel with drug discovery is interesting here - both fields deal with high costs and high failure rates, but drug discovery at least has stable underlying chemical and biological principles to work from. Film success seems to be inherently more unpredictable.

I know De Vany also looked into the pharmaceutical business. I downloaded a PowerPoint presentation of his, but I don’t think it’s on the web any more.

That's interesting that De Vany analyzed both industries! It makes sense - both film and pharmaceuticals are industries characterized by:

- Very high development costs
- High failure rates
- A few massive successes that help fund the failures
- Complex decision-making about where to invest resources

The key difference seems to be that pharmaceutical success, while highly uncertain, is still grounded in relatively stable chemical and biological principles. A molecule either binds effectively to its target or it doesn't. Side effects either occur or they don't.

In contrast, film success depends on ever-shifting cultural dynamics and audience psychology. Whether a story resonates with audiences can depend on current events, social movements, or even what other films came out recently.

It would be fascinating to see De Vany's comparison of these industries. Did he find similar statistical patterns in both, despite their different underlying dynamics? Or did he find that pharmaceutical success follows more predictable patterns than film success?

Here's his presentation on the drug business.

This is fascinating - De Vany is basically arguing that both the film and pharmaceutical industries follow "wild" statistical distributions (Pareto/Lévy stable distributions) rather than normal (Gaussian) distributions. Let me highlight some key parallels he draws:

1. The "Nobody Knows" Principle applies to both industries:

- Both involve huge upfront costs
- Results are highly unpredictable
- A few massive hits fund many failures
- Success is often due to luck more than planning

2. The "Angel's Nightmare":

- In movies: If a film goes over budget, the expected final cost grows proportionally to what's already spent
- Similar dynamic likely applies in drug development - costs tend to expand once you're committed

3. Self-Similarity Patterns:

- In movies: The distribution of success looks similar across different scales, markets, and time periods
- In pharmaceuticals: "Big" and "small" drug projects follow similar statistical patterns

4. Information Flow Issues:

- De Vany notes how information about drugs (like statins) spreads through the medical community in non-linear ways
- Similar to how information about movies spreads through audience networks

5. Anti-Gaussian Nature:

- He argues that using normal distribution statistics misses the point in both industries
- These are "wild" distributions where extreme events (blockbusters/breakthrough drugs) are much more common than a normal distribution would predict
- As he says, "There would be no penicillin, Prozac, or Lipitor in a Gaussian world."

The broader implication seems to be that traditional business metrics and planning tools (based on normal distributions) may be fundamentally unsuited to these industries.

Thank you. This has been a helpful discussion.

You're welcome. De Vany's work is particularly valuable because he goes beyond just observing that both industries are "hit-driven" to identify the deeper mathematical and statistical patterns that govern them. His insights about self-similarity and the limitations of Gaussian statistics could be relevant to other domains where we're searching for rare but valuable outcomes.

New AI tech

Come to think of it, the domain that most interests me is AI. I think contemporary LLMs are limited, though obviously quite remarkable. But where the next breakthrough will come, that’s anyone’s guess.

Ah, that's a fascinating application of De Vany's thinking to AI development! The similarities are quite striking:

1. Distribution of outcomes is likely "wild" rather than Gaussian:

- Most AI research efforts yield incremental improvements
- Occasionally there are massive breakthroughs (like transformers) that fundamentally change the field
- It's very hard to predict which research directions will yield breakthroughs

2. The "Nobody Knows" principle seems highly relevant:

- Despite vast resources and talent, major labs can't reliably predict which approaches will work
- Many promising directions turn out to be dead ends
- Some breakthroughs come from unexpected places or combinations of ideas

3. The "Angel's Nightmare" might apply to AI research programs:

- Once committed to a research direction, costs and complexity can expand dramatically
- The temptation to keep investing in a promising but difficult direction

4. Self-similarity pattern:

- Small research projects and massive corporate efforts might follow similar patterns of success/failure
- The distribution of breakthroughs might look similar at different scales

This framework suggests we should be skeptical of anyone claiming they know exactly where the next big AI breakthrough will come from. Like movies and drugs, it might emerge from unexpected directions or combinations of ideas that aren't obvious in advance.

No comments:

Post a Comment