Agent Lab — a live forecast exhibit · Applied AI
Scorecard

How good is good?

A forecast is only honest if it's scored. We publish every number ahead of kickoff and grade it once the match is played — model and agent both. The hard part is knowing what a good score even is, so we anchor it explicitly.

The floor is a naive prior: always predict the long-run base rate. Beating it is the minimum bar. The sharp market is the practical ceiling — hard to beat once its odds are converted to fair probabilities. The model is judged by where it lands between the two; the agent is judged only by whether its calls improve the model's score. Why soccer is hard →

The model, against its benchmarks

16 matches scored
Floor · naive prior
0.704
Brier · 43.8% top-pick

Always predict the base rate. The minimum bar.

Our model
0.727
Brier · 37.5% top-pick

Not yet clear of the floor — early days.

Sharp market · benchmark
Per-match prices pending

We publish the model beside each venue and score them together as results land. Model vs market →

The full record

The agent, scored

All calls →
72/72
Matches analysed
4
Picks overturned
16
Calls graded
37.5%
Agent top-pick

Over 16 graded calls, the agent's top pick was right 37.5% of the time. Each call also carries a signed score delta versus the model — whether that specific move helped or hurt — on its match page. Its biggest nudge so far: Uruguay v Cape Verde, +12.7 pp.