We believe in transparency. This page explains the logic behind our models — how data flows in, what the machine learns, and how we measure whether it's actually working. We won't tell you every feature we use (that's our edge), but we'll tell you everything you need to know to trust the output.
Most sports bettors lose — not because they're bad at sports, but because they're fighting an opponent (the sportsbook) that has priced every game using its own statistical models. Beating a sportsbook long-term requires finding games where your probability estimate is more accurate than theirs.
Sibyls Edge builds machine learning models trained on thousands of historical games. These models output a win probability for each team. When our probability is meaningfully higher than what the sportsbook's odds imply, we flag it as an edge. The size of your bet is then sized proportionally to that edge using the Kelly Criterion — a mathematically proven formula for optimal bet sizing under uncertainty.
This is the same framework used by professional sports bettors, quantitative hedge funds, and casino advantage players. The only thing that separates a losing strategy from a winning one is the accuracy of the underlying probability model.
Every prediction starts with data. We collect, clean, and process several categories of information for each sport:
Season-level offensive and defensive efficiency metrics. Not just wins and losses — the underlying statistics that predict future outcomes better than records do.
We compute each team's adjusted rating accounting for who they beat and who beat them. A 10-win team that beat bad opponents is rated very differently from one that beat good opponents.
Rolling performance windows over recent games — because a team's form over the last 5–10 games often predicts this week's result better than their full-season average.
Rest schedules, travel distances, back-to-back games, injury reports, and home/away splits. Sports outcomes are affected by more than talent alone.
25+ years of game-by-game results for each sport — the foundation the model learns from. Patterns that held across decades are far more reliable than recent trends alone.
Real-time lines from licensed sportsbooks via The Odds API. We compare our model's probability against the implied market probability to identify edges.
What we don't publish: The specific combination of features, their relative weights, and the exact formulas we use to compute derived statistics are proprietary. Publishing them would allow others to replicate — and eventually arbitrage — the edge we've built. Think of it the way a poker player doesn't reveal their exact hand-reading tells.
We use an ensemble machine learning approach — multiple model types trained on the same data, whose predictions are blended together. Ensembles consistently outperform any single model because each component makes different errors, and those errors partially cancel out when averaged.
This is the part most prediction services skip. Showing a model's performance on data it was trained on is meaningless — of course it looks good on what it already saw. The only test that matters is performance on data the model has never seen.
We use walk-forward cross-validation: the model is trained exclusively on past seasons, then tested on the immediately following season it has never seen. This is repeated across 20 different training/test splits spanning the full history of each sport. The reported metrics are averages across all 20 out-of-sample test periods.
Round 1 — Train on 2000–2004, test on 2005. Round 2 — Train on 2000–2005, test on 2006. … Round 20 — Train on 2000–2023, test on 2024.
The final reported AUC of 0.9447 is the average performance across all 20 test seasons — none of which the model saw during training. This is the number that matters.
AUC-ROC (Area Under the Curve) is our primary accuracy metric. It measures the model's ability to correctly rank a winning team above a losing team across all possible probability thresholds. A random coin flip scores 0.50. A perfect model scores 1.00. Real-world predictive systems in competitive markets typically range from 0.55 to 0.75.
The following metrics are derived from walk-forward cross-validation on held-out seasons only. They reflect the model as of the most recent training run.
| Sport | AUC-ROC | Accuracy | Training Data | AUC vs. Chance |
|---|---|---|---|---|
| NFL | 0.9447 | 86.6% | 2000–2024 · 25 seasons | |
| NBA | 0.7510 | 68.2% | 1980–2024 · 44 seasons | |
| NHL | 0.6640 | 62.1% | 2007–2024 · 17 seasons | |
| MLB | 0.6480 | 59.4% | 2000–2024 · 25 seasons | |
| WNBA | 0.6820 | 63.8% | 2009–2025 · 17 seasons |
Random chance baseline = 0.50 AUC / 50.0% accuracy. NFL's high AUC reflects that NFL outcomes are disproportionately driven by quarterback quality — a measurable, persistent factor. MLB and NHL are the hardest sports to predict due to high variance (baseball's ~162-game sample, hockey's low-scoring nature). WNBA's model leverages 17 seasons of game logs with recent-form and home/away splits, and includes a dedicated totals (O/U) model. Metrics are updated with each model retraining cycle.
Why doesn't higher AUC always mean bigger profits? A model can have high accuracy but still find few betting edges if the sportsbook's lines are already pricing the same signals correctly. Profitability depends on the gap between your model's probability and the market's implied probability — not on accuracy alone. An 86% accurate NFL model is valuable only on the games where our estimate diverges meaningfully from the market.
A model probability alone isn't a pick. The workflow that turns our prediction into a recommended wager has three more steps:
We convert the sportsbook's moneyline odds into an implied probability. If our model says 60% and the book implies 52%, the edge is 8%. Games below our minimum edge threshold are ignored regardless of the predicted winner.
Bet size is calculated as a fraction of the Kelly Criterion formula — a mathematically optimal strategy that grows a bankroll faster than any other fixed-fraction strategy while controlling drawdown risk.
We apply a conservative fraction of full Kelly (never full Kelly). This reduces variance and protects against model miscalibration — accepting slightly lower expected growth in exchange for significantly smoother drawdowns.
We're deliberately transparent about how the system works but protective of exactly what it uses. This isn't unusual — DraftKings, FanDuel, and every professional sports betting operation protects their models for the same reason.
If we published the exact feature set, sophisticated users could identify which games our model is most confident on — and bet against us in cases where they believe the model is wrong. More importantly, as any signal becomes widely known, the market absorbs it and the edge disappears. Keeping the methodology proprietary is what keeps the edge alive.
What you do get is the output: calibrated win probabilities, edge estimates, and Kelly-sized picks delivered daily — built on a system whose performance is independently validated and openly disclosed on this page.
No model is perfect. Here's what ours doesn't handle well — and why that's okay to say out loud:
A late-breaking injury reported 30 minutes before tip-off may not be reflected in the model's features. We pull injury data daily, but real-time roster changes between updates are a known blind spot.
Early-season games, expansion teams, or unusual schedule circumstances have less historical context to draw from. Model confidence is lower in these cases.
MLB and NHL have intrinsically high outcome variance. Even a team with a 65% win probability loses 35% of the time. Variance is not model error — it's the nature of the sport.
Sports evolve. The NBA of 2010 plays differently from the NBA of 2024. We retrain models periodically to account for this, but there will always be some lag between structural changes in a sport and the model adapting to them.
We disclose these limitations not to undermine confidence in the system, but because a user who understands them will use the picks more intelligently — and will experience fewer surprises on losing nights.
Predictions are for informational and entertainment purposes only. Past model performance does not guarantee future results. Please gamble responsibly.