Examples

These public examples test whether a world model built from a dated record forecasts hidden futures better than GPT reading the same visible context.

The method is direct: build the dated record, hide later events, ask each system to forecast, and compare the result with what happened. Start with Bismarck if you want the clean showcase: one PDF, one fixed past, two forecasts, then harder forks from the same state. The broader score so far:

Most stretches of any record simply continue, and every forecaster looks good there. The evidence page therefore scores calm windows and break windows separately; the interesting results are on the windows where the record breaks from trend.

PDF to event stream

Bismarck

A dense historical PDF becomes dated events and semantic pressures. The scored test hides later history and compares the world-model forecast with GPT from the same visible past; the fresh Bismarck run makes it the clearest public showcase, with custom forks for harder alternate-history moves.

  • Published model-vs-GPT test
  • Scored against later historical entries
  • Custom forks show action-conditioned forecasts

Clinical-regulatory record

Project Confirm

FDA accelerated-approval outcomes, clinical trials, submissions, and label updates become a dated regulatory event stream. The model predicts the next regulatory gate, event class, timing, and withdrawal risk on held-out drug assets.

  • 17,703 public clinical-regulatory events
  • Held out by drug asset
  • Next-gate AUROC 0.857; withdrawal signal AUROC 0.754

Company archive

Enron

Internal email, market, and news data become replayable company state. Choose a cutoff, write an action as an Enron actor, and compare the world-model forecast with a GPT baseline.

  • Real company event record
  • Held-out rows and semantic branch cases
  • Good for action-conditioned consequence intuition

Public record

Public History

U.S. macro records are converted into dated state: inflation, labor, rates, GDP, Treasury yields, and public releases. Pick a cutoff and test a memo, warning, watch, or hold recommendation.

  • Current and analog macro windows
  • Open-ended branch text
  • Best for explaining the method in public data

Public news archive

Civil War-era public news

A 280-record public-news timeline from 1859 through 1865 becomes a historical state surface. Pick a date, read what was visible then, and test what a bulletin, memo, watch, or hold decision might change.

  • 1859-1865 public-news record
  • Public-history version of the fork test
  • Good for seeing how the method handles crisis context

Memorization control

Fictional worlds: Star Wars and Middle-earth

Synthetic event streams from invented worlds, with hidden futures that are not internet plot continuations. The world model and a live GPT baseline forecast the same hidden continuations from the same record.

  • Star Wars: model takes 3 of 4 sampled GPT windows; ridge takes 1
  • Middle-earth: GPT takes 0, ridge takes the sampled windows
  • Direct check against memorization and momentum

What the examples show

Bismarck is the best first demo because it shows the whole loop without requiring company context: PDF to event spine, hidden future, world-model vs GPT comparison, then action forks from the same cutoff. The strongest results come from dated records, consequence questions, and later evidence that can score the forecast; the other examples show the same machinery on different records: Project Confirm for clinical-regulatory gates, Enron for company forks, macro history and Civil War-era news for public records. The fictional-world rows are useful for a different reason: they are a memorization control, and Middle-earth is the honest reminder that a simple persistence baseline can beat everyone on calm stretches. The Enron and public-history demos run on the same trained model checkpoints that the evidence page reports, so the numbers there describe the demos here.

Custom forks extend the same machinery to possible actions from a fixed state. The scored tests come from hiding later rows and asking which system read the next pressures more accurately.

GPT remains the better reader: give it the context and it writes you a clear, confident explanation. The world model earns its place when the job is the part the explanation skips — hold the state fixed, test the actions against each other, and let the next record say who was right.

Open the technical evidence