Baseball Model v2: Fixing a Data Leak and Smarter Run Totals
Every model review starts with the same question: what is the model getting wrong that we can actually fix? For the baseball model, this season's review turned up two clear answers. The first was a data leak — a subtle one that inflated training accuracy without meaning much in production. The second was a totals problem that was generating real losing bets. Both are fixed in baseball v2, which is now live.
The data leak: training on the future
The original model was trained on seasonal pitcher and team statistics. On the surface that sounds fine — ERA, strikeout rate, bullpen ERA, offensive run production, all the inputs you would expect. The problem was in the timing. For every game in the training data, the model was looking at the player's end-of-season statistics — the final numbers accumulated across all 162 games.
The fix is what is called point-in-time statistics: instead of the final seasonal total, the model now uses only what was known before first pitch — a season-to-date average, built from individual per-start game logs. For games early in the season, a prior-season fallback fills the gap. The model now sees exactly what a bettor and an oddsmaker saw on game day, nothing more.
This is not the kind of fix that shows up dramatically in headline accuracy metrics — a look-ahead leak tends to make training numbers look better than they should, not obviously wrong. But it makes the model honest. You can track our live accuracy numbers as the season unfolds against a genuinely leakage-free baseline.
The totals problem: knowing when to defer
The second issue was more visible in the profit and loss. A mid-season review of over/under bets found the model was systematically under-predicting run totals. Not by a lot — but enough to consistently push it toward 'under' bets that were losing at a meaningful rate.
The harder finding was this: on the totals market specifically, the model was actually weaker than the sharp market. We measured the correlation between our predicted totals and actual game scores — then compared that to the implied total sitting in Pinnacle's line. Our model's correlation with reality: 0.14. Pinnacle's implied total: 0.25. On this particular question — how many runs will be scored tonight — the sharp market had substantially better information than our model.
The fix is a technique we also use in the tennis model: shrink toward the sharp number. Where our model is demonstrably weaker than the sharp market, we blend our predicted probability toward Pinnacle's implied probability. For totals this means the over/under probability is not purely our model's output — it is anchored to the sharp market's line, weighted by how much we trust our own signal.
The goal is not to simply copy Pinnacle — if we did that, there would be no edge at all. The goal is to combine our signal with theirs in the right proportions. We keep more of our own signal on the markets where we have demonstrated an edge. On totals, we defer more heavily to the sharp line.
What was not changed
The moneyline and run-line markets — home/away winner and ±1.5 runs — were left untouched. A review of settled bets on those markets showed positive closing line value, which is the metric that matters most for long-term profitability. If the model is consistently identifying value that the closing market confirms, we do not fix what is not broken.
- Moneyline (ML) — unchanged. Positive CLV maintained; model calibration solid on win/loss.
- Run Line (RL ±1.5) — unchanged. Same strong signal as moneyline with appropriate margin.
- First 5 innings (F5 ML + F5 Totals) — unchanged. Starter-heavy market where pitcher point-in-time stats now apply cleanly.
- Over/under totals — shrink-to-Pinnacle blending now active. Fewer bets, higher win rate.
This is the honest framing we apply across every market and every sport: bet less and bet better. A model that bets on everything is not a good model — it is a model that does not know its own limits. Baseball v2 knows where it is strong and where it should defer.
The right goal: matching the sharpest market
The bar we hold ourselves to is the same across all our sports: match the calibration of the sharpest market, then find soft bookmakers pricing above the fair line. We are not trying to beat Pinnacle's closing line in absolute accuracy terms — that is a very high bar that most quantitative teams never reach. We are trying to be calibrated enough that when a soft bookmaker is offering 2.20 on something we and Pinnacle agree is closer to 1.95, we can act on that gap with confidence.
The leakage fix and the totals discipline both serve that goal. A model trained on future data will appear calibrated but is not. A model that ignores its own relative weakness on totals will bleed on under bets all season. Neither version of the model was exploiting genuine edge — v2 is. You can follow the live performance on our model page.
We will continue publishing CLV figures on a rolling basis as settled bets accumulate. One hundred bets tells you very little — five hundred begins to mean something. We are tracking, and we will share what the data says.