Reference Class Identifier
Finding the right history to predict tomorrow.
Overview
This pillar identifies a class of similar past events to establish a statistically valid base rate probability for a current prediction. It grounds forecasts in historical data, correcting for common cognitive biases like over-optimism or uniqueness.
What It Does
It systematically analyzes the key features of a market's question, then searches historical records for comparable situations. By grouping these precedents into a 'reference class', it calculates the historical frequency of a specific outcome. This provides an objective, data-driven starting point for any probability assessment.
Why It Matters
It provides a powerful antidote to the 'inside view', where forecasters treat every situation as unique and are swayed by narrative. By forcing an 'outside view', this pillar anchors predictions in reality, significantly improving calibration and long-term accuracy.
How It Works
First, the prediction problem is deconstructed into its core, measurable attributes. Next, historical data is scanned to find events that share these attributes. These events form the reference class, which is then analyzed to calculate the base rate, or the simple percentage of times a certain outcome occurred.
Methodology
The core calculation is the base rate: (Number of historical successes) / (Total number of relevant historical events). Event similarity is often determined by creating feature vectors for each event and using a distance metric, like cosine similarity, to find the closest matches. A minimum sample size, typically N > 10, is required for statistical significance.
Edge & Advantage
This provides an edge by systematically removing emotional and narrative-based biases from a forecast, replacing them with a cold, hard statistical baseline that most other predictors ignore.
Key Indicators
-
Historical Event Similarity Score
highA metric showing how closely past events match the key attributes of the current situation.
-
Reference Class Sample Size
highThe total number of comparable historical events found. A larger sample size increases confidence in the base rate.
-
Base Rate Outcome Frequency
mediumThe percentage of times the outcome in question occurred within the identified reference class.
Data Sources
-
Provides data on past IPOs, mergers, and economic cycles from sources like CRSP or Compustat.
-
Databases like Sports-Reference.com that contain decades of game and player performance data.
-
Academic & Governmental Databases
Repositories of historical data on elections, legislation, scientific studies, and international conflicts.
Example Questions This Pillar Answers
- → Will a first-term incumbent US President win re-election?
- → Will a tech startup valued over $1B at IPO be profitable within 3 years?
- → Will a movie with a production budget over $200M gross over $1B worldwide?
Tags
Use Reference Class Identifier on a real market
Run this analytical framework on any Polymarket or Kalshi event contract.
Try PillarLab