How our AI model works
A machine-learning model plus two statistical models, combined into one ensemble confidence score — trained on 28,000+ matches and judged on calibration, not just headline accuracy.
Every tip on Daily Sport Pick is generated by an automated system that runs several independent prediction models on each match of the day. The models work separately, then their outputs are combined into a single ensemble confidence score between 0 and 100%.
This reduces the risk of any single model’s blind spots driving the final call. When the models agree, the score is higher. When they disagree, the score is lower — and we are more cautious about publishing that tip.
| At a glance | |
|---|---|
| Models combined | Machine learning + Poisson + Dixon-Coles |
| Training data | 28,000+ historical matches (2024 → today) |
| ML test accuracy (1X2) | ~45% on a strict chronological test set (baseline 33%) |
| Features per prediction | 30 (form, goals, ELO, league strength) |
| Core leagues published | 14 (our strongest-coverage competitions) |
| Refresh | Models re-run every morning |
Football is hard to predict, and a three-way market (home/draw/away) gives a 33% baseline just from guessing. Some sites quote accuracy above 60% by using a random train/test split — but that leaks future matches into the test and flatters the number. We use a chronological split (learn from the past, test on the most recent matches), which mirrors real betting and gives around 45%. We’d rather show you a real figure, and we judge the model on something more useful than raw accuracy: calibration — whether a stated “60% chance” really wins about 60% of the time.
Model 1 — Machine learning (60% weight)
Our primary model is a gradient-boosted decision-tree classifier trained on 28,000+ historical matches. It actually runs in two tiers: a newer histogram gradient-boosting model (v4) handles our core competitions, where data quality is highest, and an older gradient-boosting model (v3) acts as a fallback for lower-coverage leagues.
Accuracy: ~45% at predicting the correct 1X2 outcome on a held-out chronological test set of ~4,300 recent matches — well above the 33% three-way baseline. Accuracy varies by competition: it is stronger in leagues like Serie A and the Eredivisie (above 50%) and lower in less predictable ones.
The model uses 30 features per prediction, including:
- ELO ratings — a global rating plus separate home and away ratings per team. The home/away split is among the strongest signals in the model.
- Recent form — wins, draws, losses and a form score over the last 5 matches for both teams
- Goals averages — goals scored and conceded per game over recent matches
- League strength — the tier and competitiveness of the competition
The model outputs a probability for each of three outcomes — home win, draw or away win — and the most probable becomes the prediction. Crucially, every feature is computed point-in-time: only information available before kick-off is used, so the model is never trained on hindsight.
Model 2 — Poisson distribution (20% weight)
The Poisson model takes a purely mathematical approach. Rather than learning from labelled outcomes, it estimates how many goals each team is likely to score, then uses the Poisson distribution to build a full score matrix.
For each scoreline (e.g. 1-0, 2-1, 0-0) it computes an exact probability. From this matrix we derive:
- 1X2 probabilities (home win / draw / away win)
- Over/Under goal probabilities
- The most likely exact scores
It calculates attack and defence ratings per team per competition, corrected for home advantage and normalised to the league average. A strong attack facing a weak defence yields higher expected goals.
Model 3 — Dixon-Coles MLE (20% weight)
The Dixon-Coles model (Dixon & Coles, 1997) extends Poisson to fix a known weakness: standard Poisson underestimates low-scoring results like 0-0, 1-0, 0-1 and 1-1.
Dixon-Coles adds a correction factor called tau (τ) for those four scorelines. It also fits the attack, defence and home-advantage parameters together using Maximum Likelihood Estimation (MLE) via scipy’s L-BFGS-B optimiser, rather than treating them as simple averages. It is also time-weighted, so recent matches count for more than older ones and the model adapts faster to mid-season form.
Dixon-Coles is fitted separately for each competition, across hundreds of leagues — far beyond the handful we actively publish tips on.
The ensemble: combining the models
After each model predicts a match, the ensemble layer blends them with fixed weights:
| Approach | Weight | Optimised for |
|---|---|---|
| Machine learning (v4 + v3 fallback) | 60% | Overall match outcome (1X2) |
| Poisson distribution | 20% | Over/Under and goals markets |
| Dixon-Coles MLE | 20% | Scorelines and low-score correction |
If a model isn’t available for a particular competition, the weights are re-normalised across the models that are — so a lower-tier match might be scored on Poisson and Dixon-Coles alone.
The confidence score is then built from four parts:
| Component | Range | What it rewards |
|---|---|---|
| Base score | 0–75 | The weighted probability of the predicted outcome, scaled from the 33% random baseline |
| Consensus bonus | 0–15 | +15 if all three models agree, +7 if two agree |
| Spread bonus | 0–10 | How clearly the top outcome leads the alternatives |
| Goals-consensus bonus | 0–15 | Added when the models strongly agree on Over/Under |
The four parts are summed and capped at 100. A high score means strong agreement and a clear leading outcome — but we deliberately apply an over-confidence cap so the system never overstates how certain it really is.
How we choose which tips to publish
Generating a prediction is only half the job. A model can be confident and still be wrong, so several filters sit between the raw prediction and a published tip:
- Selective publishing. We publish tips on a core set of competitions where our coverage and model quality are strongest — rather than spraying picks across every league we can technically score.
- Confidence floors and an over-confidence cap. Tips below a minimum confidence are held back, and unusually high scores are capped, because over-confidence is where tipping models lose money.
- Market-aware Asian Handicap lines. For handicap tips we don’t guess a line — we compute, from the Poisson score matrix, the line where the chosen side has roughly a 50% chance of covering (the “balanced line” bookmakers like Pinnacle use). This replaced an older rule-of-thumb that was systematically over-confident.
- Cross-division safety checks. Cup and play-off fixtures can pit teams from different divisions against each other. We detect those cases and skip them, so a second-tier side isn’t mistakenly rated as if it played in the top flight.
Calibration: the metric that matters
Accuracy tells you how often the single most likely outcome happens. Calibration tells you whether the probabilities themselves are trustworthy — and for betting, that’s what counts. If we label a batch of tips “70% likely”, roughly 70% of them should win. We track this continuously and surface it on each tip as an AI Confidence label (High / Medium / Low).
See the model in action
Check today’s tips and their confidence scores
Frequently asked questions
What prediction models does Daily Sport Pick use?
We combine three modelling approaches into one ensemble: a machine-learning model (gradient-boosted decision trees) trained on 28,000+ historical matches, a Poisson goals model, and a Dixon-Coles model that corrects for low-scoring results. Their outputs are merged into a single confidence score between 0 and 100.
What is the ensemble confidence score?
The ensemble score (0–100) blends the three approaches with fixed weights — machine learning 60%, Poisson 20% and Dixon-Coles 20% — then adds bonuses when the models agree and when one outcome clearly leads. Higher scores mean stronger agreement across models.
How accurate is the AI model?
On a strict chronological test set (train on the past, test on the most recent matches), our machine-learning model predicts the correct 1X2 result about 45% of the time — well above the 33% you’d expect from guessing a three-way market. We report this honest out-of-time figure rather than an inflated random-split number, and we focus on whether our probabilities are well calibrated.
Why is your accuracy lower than some other tipster sites?
Because we measure it honestly. A random train/test split can push reported football accuracy well above 60%, but it leaks information from the future into the test. We use a chronological split that mirrors real betting conditions, which gives roughly 45%. We’d rather publish a real number than a flattering one.
What does the score heatmap show?
The score heatmap shows the probability of every scoreline from 0-0 to 4-4, based on the Dixon-Coles model. Blue cells are home-win scenarios, grey cells are draws and orange cells are away wins. Darker colours mean higher probability.
Why do Poisson and Dixon-Coles have lower 1X2 accuracy than the machine-learning model?
They’re not built primarily to pick winners. They model how many goals each team is likely to score, which makes them strong for Over/Under and exact-score markets. Their role in the ensemble is complementary — they add goals and scoreline information the machine-learning model doesn’t capture directly.
