Superforcasting is a technique for making predictions about world events that originated in The Good Judgment Project. I’ve known about it for a while and read about it here and there. Yesterday (March 31, 2025) Malcolm Murray published A Superforecaster’s View on AGI in 3 Quarks Daily. I won’t try to summarize the article, but I do want to extract a couple of things from it.
First he distinguishes between input and output definitions of AGI. Judging from his overall usage, he means input to and output from the economy, but he doesn’t explicitly say that. Input definitions attempt to characterize AGI by the cognitive tasks the system can perform. Most debates about AGI are about this. In contrast, output definitions characterize AGI by its effect on the economy: How does productivity increase?
After giving his definitions of both, he offers us this:
So, the two distinct questions we end up with are:
- Will there exist by Dec 31, 2030, an AI that is able to do every cognitive digital task equivalently or better than the best human, in an equivalent or shorter time, and for an equivalent or cheaper cost?
- Will by Dec 31, 2030, the U.S. have seen year-on-year full-year GDP growth rate of 19% or higher?
- Overall, what I would argue is the most important
Superforecasting technique is to look at the historical base rates and weigh these heavily in the analysis. At a high level, this means looking to the past for answers about the future and assuming that the past either repeats itself or at least rhymes, as Mark Twain might have said. Not assuming that the future will be radically different than the past is something that has served forecasters well for a long time. However, there are some key nuances that must be kept in mind and important choices to make when establishing the base rate.
After explaining something of his reasoning process, he offers us this:
These two questions have separate base rates and will likely therefore resolve differently. My forecast, with 60% and 90% likelihood, respectively, is that the first one will resolve in the affirmative and the second one in the negative. I.e., by 2030, it seems that we might have “AGI” but not “TAI”.
Here AGI is his input definition and TAI (Transformative AI, “powerful enough that the economy changes significantly”). So, he things it is fairly likely that we’ll have something that looks and smells like AGI by 2130, but it is unlikely to have much effect on the economy.
Here's my response:
I've got intellectual stakes in this game. On "AGI," "ASI," "PASTA," my basic impulse is "a pox on all your houses." But if we must use a term, I pick "AGI." My own prediction, with fairly high confidence (say >75%), is that we'll have substantial understanding of how LLMs work internally before AGI is reached. I make no prediction on when either of those will be achieved. I do think it possible that we'll achieve such understanding without ever reaching AGI, or that AGI will come much later than such understanding.
What do I mean by "substantial understanding"? Tricky question. However that understanding is expressed (e.g. a formal mathematical theory, engineering standards) it will allow us to explicitly engineer systems to meet specified performance objectives given specified operational capacities and input data for training.
Note that in arriving at the >75% figure I didn't undertake any explicit process. That number is pure spitballing.
One could reasonably say that my comment is not responsive to the argument he made. Why? What it comes down to is that Murray seems to think that the causal processing leading to AGI is homogenous, my term, while I don’t think it is.
To see what that’s about I’m afraid you’re going to have to read a dialog I had with Claude 3.7 about the nature of superforcasting. As always in these conversations, my remarks are in bold face while Claude’s responses are in plain face. Note that some headings in Claude's response will be in bold face, but that should be obvious from the context.
* * * * *
The Nature of Superforcasting
I understand that there is a formal discipline of forecasting associated with Philip E. Tetlock and the book Superforecasting: The Art and Science of Prediction. I wonder where it has been most successful, what kinds of phenomena? For example, we aren’t going to use it to predict the locations of the planets. Why not? Because that’s a well understood physical system where we can calculate planetary motion. Similarly, we’re not going to use it to predict the weather, and for much the same reason. Though in this case the system is complex, in a technical sense, and our ability to predict the weather is limited by the amount of data we have for estimating an initial state and the amount of computing power we have to run the prediction. I would also say that the causal processes in those systems are homogeneous in the sense that there is only one kind of causal process involved. OTOH, its not at all obvious to me that the stock market is homogeneous in that sense. I suppose the causal process is one involving the collective judgements of people trading in the markets. Those people have all kinds of reasons for making trades based on a wide variety of data available to them, including personal superstition. So, where has superforcasting been most useful?
I'll explore where superforecasting has been most successful based on the distinctions you've drawn between different types of predictive systems.
Superforecasting, as developed by Philip Tetlock and colleagues, has been most successful in domains with specific characteristics that fall between the deterministic physical systems you mentioned (planetary motion, weather) and completely chaotic or random systems.
The most successful applications of superforecasting tend to be in areas with these features:
- Geopolitical events and international relations: Superforecasting has shown particular success in predicting political stability, leadership changes, and diplomatic conflicts. The Good Judgment Project (GJP) demonstrated superior accuracy in forecasting geopolitical events compared to intelligence analysts with access to classified information.
- Economic trends and indicators: While not perfectly predicting specific stock movements (as you noted), superforecasters have demonstrated skill in predicting broader economic indicators like GDP growth, inflation rates, and unemployment figures.
- Public health outcomes: The COVID-19 pandemic saw superforecasting techniques applied to predict case numbers, hospitalization rates, and vaccine development timelines with varying degrees of success.
What makes these domains suitable for superforecasting approaches?
- They involve complex causal processes that combine both systematic patterns and human judgment/behavior
- They have enough stability that expertise and historical patterns matter, but enough uncertainty that probabilistic thinking is valuable
- They involve phenomena where updating forecasts based on new information improves accuracy
- They benefit from aggregating diverse perspectives and avoiding cognitive biases
Superforecasting is less useful in domains where:
- Physical laws completely determine outcomes (planetary motion)
- Pure randomness dominates (lottery numbers)
- The time horizon is extremely long
- There's complete information asymmetry
The most successful superforecasting applications typically involve what Tetlock calls "Goldilocks problems" - not too hard, not too easy, where careful thinking and information aggregation can outperform both simple models and expert intuition.
Would you like me to elaborate on any particular application domain or on the specific forecasting techniques that have proven most effective?
Supercasting the future of AI
In your judgment, is predicting the future of AI a plausible domain for superforcasting? Why or why not? Perhaps you could imagine a range of answers to that question, with reasoning appropriate to each answer within the range.
I think predicting the future of AI is a mixed case for superforecasting, with some aspects being amenable to this approach and others being more challenging. Let me explore a range of possible answers with reasoning for each:
Position 1: AI is well-suited for superforecasting techniques
This position holds that AI development has enough structure and pattern to make superforecasting valuable. Supporting reasons include:
- AI development follows somewhat predictable research and deployment cycles
- There's a large community of experts with diverse perspectives who could contribute forecasts
- Short to medium-term developments (1-3 years) often build incrementally on existing capabilities
- Many AI milestones involve measurable benchmarks (e.g., performance on specific tasks)
- Historical patterns of technology adoption can inform diffusion predictions
Position 2: AI is moderately suited for superforecasting with significant limitations
This middle position acknowledges both strengths and weaknesses:
- Near-term technical developments may be forecastable (e.g., when certain performance thresholds will be reached)
- Commercial adoption patterns might follow historically similar technologies
- However, discontinuous breakthroughs ("AI winters" and sudden advances) introduce unpredictability
- The field involves interacting technical, economic, social, and regulatory factors
- The most important questions often involve complex social impacts rather than just technical capabilities
Position 3: AI is poorly suited for superforecasting
This position argues that AI's future is fundamentally too uncertain:
- Revolutionary breakthroughs can occur unexpectedly, creating discontinuities that break trend-based forecasting
- The interplay between AI capabilities, human adaptation, and regulatory responses creates too many feedback loops
- Long-term impacts involve deeply transformative possibilities that lack historical analogues
- The most important questions about AI's future may be precisely those where we have the least precedent
- The field suffers from both hype cycles and secrecy that distort information needed for accurate forecasting
My judgment is closest to Position 2. I believe superforecasting can be valuable for specific, well-defined questions about AI with moderate time horizons (1-5 years), particularly regarding technical capabilities, commercial adoption patterns, and regulatory developments. However, the farther out we look and the more transformative the potential impacts, the less reliable superforecasting becomes for AI.