About the model

MLBPredict estimates the probability that the home team wins a given MLB game. It now uses real public data from the official MLB StatsAPI schedule endpoint: completed games update ratings, and unplayed schedule rows become static forecasts.

Ratings

The first production model is intentionally simple and auditable: a chronological team Elo system with a tuned K-factor, home-field rating bonus, probability temperature, and season-to-season carryover. Every completed game is predicted before it updates the teams. Glicko state is also maintained for future uncertainty-aware model layers, but the displayed backtest is the Elo walk-forward baseline.

Data coverage

The committed raw dataset covers MLB seasons 2021, 2022, 2023, 2024, 2025, 2026. It stores schedule-level game results only — team IDs/names, venue, date, scores, and final/home-away outcome — not pitch-by-pitch data.

Performance

The current scored window contains 3,468 completed games, with Brier score 0.245, log loss 0.683, and pick accuracy 55.6%. These are honest chronological walk-forward numbers, not in-sample ratings after the result is known.

Limitations

This version does not yet adjust for probable pitchers, injuries, lineups, rest/travel, bullpen availability, park factors, or betting-market information. Neutral sites are inferred when the game venue differs from the home team's listed venue. Probabilities should be treated as a transparent ratings baseline for analysis and entertainment, not betting advice.