Methodology

How Sibyls Edge Makes Its Predictions

We believe in transparency. This page explains the logic behind our models — how data flows in, what the machine learns, and how we measure whether it's actually working. We won't tell you every feature we use (that's our edge), but we'll tell you everything you need to know to trust the output.

25+
Years of Data
Per sport, 2000–2024
5
Sports Covered
NFL · NBA · NHL · MLB · WNBA
0.94
Best AUC (NFL)
vs. 0.50 random chance
20
CV Folds Per Model
Walk-forward validated

The Core Idea

Most sports bettors lose — not because they're bad at sports, but because they're fighting an opponent (the sportsbook) that has priced every game using its own statistical models. Beating a sportsbook long-term requires finding games where your probability estimate is more accurate than theirs.

Sibyls Edge builds machine learning models trained on thousands of historical games. These models output a win probability for each team. When our probability is meaningfully higher than what the sportsbook's odds imply, we flag it as an edge. The size of your bet is then sized proportionally to that edge using the Kelly Criterion — a mathematically proven formula for optimal bet sizing under uncertainty.

This is the same framework used by professional sports bettors, quantitative hedge funds, and casino advantage players. The only thing that separates a losing strategy from a winning one is the accuracy of the underlying probability model.

What Goes Into the Model

Every prediction starts with data. We collect, clean, and process several categories of information for each sport:

Team Performance

Season-level offensive and defensive efficiency metrics. Not just wins and losses — the underlying statistics that predict future outcomes better than records do.

Strength of Schedule

We compute each team's adjusted rating accounting for who they beat and who beat them. A 10-win team that beat bad opponents is rated very differently from one that beat good opponents.

Recent Form

Rolling performance windows over recent games — because a team's form over the last 5–10 games often predicts this week's result better than their full-season average.

Contextual Factors

Rest schedules, travel distances, back-to-back games, injury reports, and home/away splits. Sports outcomes are affected by more than talent alone.

Historical Game Logs

25+ years of game-by-game results for each sport — the foundation the model learns from. Patterns that held across decades are far more reliable than recent trends alone.

Market Odds

Real-time lines from licensed sportsbooks via The Odds API. We compare our model's probability against the implied market probability to identify edges.

What we don't publish: The specific combination of features, their relative weights, and the exact formulas we use to compute derived statistics are proprietary. Publishing them would allow others to replicate — and eventually arbitrage — the edge we've built. Think of it the way a poker player doesn't reveal their exact hand-reading tells.

How the Model Learns

We use an ensemble machine learning approach — multiple model types trained on the same data, whose predictions are blended together. Ensembles consistently outperform any single model because each component makes different errors, and those errors partially cancel out when averaged.

  1. Feature Engineering
    Raw statistics (points scored, yards gained, etc.) are transformed into differentials and ratios — the form that best captures the relative quality gap between two teams before they play.
  2. Gradient Boosting Models
    The primary model type. Gradient boosting builds decision trees sequentially, each one correcting the errors of the last. It excels at capturing non-linear relationships — for example, the interaction between a team's rest days and their road travel distance.
  3. Regularized Logistic Regression (Baseline)
    A deliberately simple model included in every ensemble. When the linear model matches or beats the complex one, it tells us the signal in the data is strong and clean. It also provides a stable floor that prevents the ensemble from overreacting to noise.
  4. Probability Calibration
    Raw model scores are transformed into true probabilities using isotonic regression — a technique that maps the model's output to observed win rates. This step is critical for Kelly sizing: a model that says 65% when the real rate is 58% will overbid and go broke.
  5. Ensemble Blending
    The calibrated outputs of each model are combined using weighted averaging. Weights are determined empirically — models that perform better in validation receive higher weight.

How We Know It's Actually Working

This is the part most prediction services skip. Showing a model's performance on data it was trained on is meaningless — of course it looks good on what it already saw. The only test that matters is performance on data the model has never seen.

We use walk-forward cross-validation: the model is trained exclusively on past seasons, then tested on the immediately following season it has never seen. This is repeated across 20 different training/test splits spanning the full history of each sport. The reported metrics are averages across all 20 out-of-sample test periods.

Example: NFL Walk-Forward Validation

Round 1 — Train on 2000–2004, test on 2005. Round 2 — Train on 2000–2005, test on 2006. … Round 20 — Train on 2000–2023, test on 2024.

The final reported AUC of 0.9447 is the average performance across all 20 test seasons — none of which the model saw during training. This is the number that matters.

AUC-ROC (Area Under the Curve) is our primary accuracy metric. It measures the model's ability to correctly rank a winning team above a losing team across all possible probability thresholds. A random coin flip scores 0.50. A perfect model scores 1.00. Real-world predictive systems in competitive markets typically range from 0.55 to 0.75.

Model Performance by Sport

The following metrics are derived from walk-forward cross-validation on held-out seasons only. They reflect the model as of the most recent training run.

Sport AUC-ROC Accuracy Training Data AUC vs. Chance
NFL 0.9447 86.6% 2000–2024 · 25 seasons
+44.5%
NBA 0.7510 68.2% 1980–2024 · 44 seasons
+25.1%
NHL 0.6640 62.1% 2007–2024 · 17 seasons
+16.4%
MLB 0.6480 59.4% 2000–2024 · 25 seasons
+14.8%
WNBA 0.6820 63.8% 2009–2025 · 17 seasons
+18.2%

Random chance baseline = 0.50 AUC / 50.0% accuracy. NFL's high AUC reflects that NFL outcomes are disproportionately driven by quarterback quality — a measurable, persistent factor. MLB and NHL are the hardest sports to predict due to high variance (baseball's ~162-game sample, hockey's low-scoring nature). WNBA's model leverages 17 seasons of game logs with recent-form and home/away splits, and includes a dedicated totals (O/U) model. Metrics are updated with each model retraining cycle.

Why doesn't higher AUC always mean bigger profits? A model can have high accuracy but still find few betting edges if the sportsbook's lines are already pricing the same signals correctly. Profitability depends on the gap between your model's probability and the market's implied probability — not on accuracy alone. An 86% accurate NFL model is valuable only on the games where our estimate diverges meaningfully from the market.

From Probability to Pick

A model probability alone isn't a pick. The workflow that turns our prediction into a recommended wager has three more steps:

1. Edge Calculation

We convert the sportsbook's moneyline odds into an implied probability. If our model says 60% and the book implies 52%, the edge is 8%. Games below our minimum edge threshold are ignored regardless of the predicted winner.

2. Kelly Sizing

Bet size is calculated as a fraction of the Kelly Criterion formula — a mathematically optimal strategy that grows a bankroll faster than any other fixed-fraction strategy while controlling drawdown risk.

3. Fractional Kelly

We apply a conservative fraction of full Kelly (never full Kelly). This reduces variance and protects against model miscalibration — accepting slightly lower expected growth in exchange for significantly smoother drawdowns.

What We Don't Share — and Why

We're deliberately transparent about how the system works but protective of exactly what it uses. This isn't unusual — DraftKings, FanDuel, and every professional sports betting operation protects their models for the same reason.

Kept Proprietary

  • The complete feature list for each sport
  • Feature weights and importance rankings
  • Ensemble blend ratios between models
  • Specific derived statistics and transformation formulas
  • Minimum edge thresholds and Kelly fraction applied
  • Any post-processing or override logic

If we published the exact feature set, sophisticated users could identify which games our model is most confident on — and bet against us in cases where they believe the model is wrong. More importantly, as any signal becomes widely known, the market absorbs it and the edge disappears. Keeping the methodology proprietary is what keeps the edge alive.

What you do get is the output: calibrated win probabilities, edge estimates, and Kelly-sized picks delivered daily — built on a system whose performance is independently validated and openly disclosed on this page.

Honest Limitations

No model is perfect. Here's what ours doesn't handle well — and why that's okay to say out loud:

Breaking News

A late-breaking injury reported 30 minutes before tip-off may not be reflected in the model's features. We pull injury data daily, but real-time roster changes between updates are a known blind spot.

Low-Data Matchups

Early-season games, expansion teams, or unusual schedule circumstances have less historical context to draw from. Model confidence is lower in these cases.

High-Variance Sports

MLB and NHL have intrinsically high outcome variance. Even a team with a 65% win probability loses 35% of the time. Variance is not model error — it's the nature of the sport.

Model Drift

Sports evolve. The NBA of 2010 plays differently from the NBA of 2024. We retrain models periodically to account for this, but there will always be some lag between structural changes in a sport and the model adapting to them.

We disclose these limitations not to undermine confidence in the system, but because a user who understands them will use the picks more intelligently — and will experience fewer surprises on losing nights.

Terms of Service · Privacy Policy · Responsible Gaming · Contact Us

Predictions are for informational and entertainment purposes only. Past model performance does not guarantee future results. Please gamble responsibly.