Hockey Model v2: Fixing a Data Leak and Adding the 2025 Season
We have just shipped hockey v2 — an upgrade driven not by a headline accuracy jump, but by something more fundamental: we found a data leak in the training pipeline, fixed it, and refreshed the model with the completed 2025 season. The goal-prediction accuracy is essentially unchanged. That is exactly the result we wanted.
What is data leakage?
Data leakage means the model saw information during training that it could not possibly have at prediction time. In practice, it produces a model that looks better on paper than it really is — because it was, in effect, trained on the future. For a betting model, that is a serious integrity problem even when the live predictions are made differently.
The specific leak here involved team strength ratings (Elo). Elo is a widely used technique for measuring team quality: every match shifts the rating slightly based on the result and the opponent's strength. A higher Elo means a stronger team, and over many matches the ratings converge toward a fair ranking. You can read more about how we use Elo — and how betting models work more generally — in our guides.
The specific leak: Elo stored as end-state, not point-in-time
Our database stores one Elo value per team — the current, up-to-date rating. When the trainer ran on historical matches, it fetched that current rating and used it as the team strength for every historical game, including games played years ago. A 2019 playoff game was trained with a rating that already reflected every season from 2020 through 2025.
The fix is conceptually simple. Every time our collector updates Elo after a match, it now writes the pre-match Elo — the rating as it stood before the puck dropped — directly into the match record. The trainer reads that historical value. For live predictions on upcoming games, we still use the current Elo (which is correct: the latest rating is the best estimate of current strength). Only the training data was affected.
The result: same accuracy, clean model
After fixing the leak and retraining — also adding the full 2025 NHL season, which had been missing from the training data — goal-prediction error came in essentially flat: MAE ≈ 1.37 home / 1.36 away (lower is better). There was no accuracy gain to speak of.
We want to be direct about that. The improvement here is not a new predictive edge — it is trust. A model trained with leaked data is producing numbers we cannot fully explain or rely on. A leakage-free model whose numbers hold steady is one we can reason about honestly, and whose closing line value performance we can attribute to genuine signal rather than a training artefact.
- Goal-prediction error (MAE) — home ~1.37, away ~1.36 — essentially unchanged, as expected once leakage is removed.
- Elo feature importance dropped — previously Elo sat in the top features; after the fix it correctly fell back, because its old high importance was inflated by the leaked future data.
- 2025 season included — training data was stale through 2024; v2 incorporates the full 2024–25 NHL season.
- Live predictions unaffected — upcoming-game predictions were always computed with the current Elo. Only the historical training path changed.
Why integrity matters for betting
Our whole approach to value betting rests on one idea: the model's probability estimates should be as close to the true distribution of outcomes as possible. When we find a soft bookmaker offering odds above what our model says is fair, that gap needs to be a real pricing discrepancy — not a compensation for model errors we have not found yet.
A leaked model creates a subtle but compounding problem. The training-time inflated importance of Elo could, in principle, pull predictions in wrong directions for teams whose current Elo differs substantially from their historical trajectory. Removing the leak does not dramatically change the average output — but it removes a hidden distortion. The model page shows the updated performance numbers, tracked live via closing line value against Pinnacle.
What is next for hockey
With the 2025 Stanley Cup playoffs now settled, the model carries a clean, up-to-date history through the end of the 2024–25 season. We will retrain again in October when the 2025–26 season is underway and enough games have been played to move the Elo ratings into a reliable range. Until then, v2 is the live model. You can see all active picks on our model page.
Past performance does not guarantee future results. Betting carries real financial risk; please bet responsibly, 18+ only.