sportsdataverse-py 
See CHANGELOG.md for details.
sportsdataverse-py gives the community free, tidy, analysis-ready sports data in Python. It is the Python member of the SportsDataverse family and deliberately mirrors its R sisters — hoopR (NBA/MBB), wehoop (WNBA/WBB), cfbfastR (CFB), baseballr (MLB), and fastRhockey (NHL/PWHL) — so the function you know in R is the function you call in Python. The NFL module mirrors the nflverse's nflreadpy, and the package plays well with the wider PySport ecosystem. Beyond aggregation and tidying, the project also exists to make open-source expected-points and win-probability models reproducible and benchmarkable, especially for American football.
New here? Read Ecosystem & philosophy for the design philosophy, the full function-naming paradigm, and how the Python and R packages line up.
Quickstart
pip install sportsdataverse
# Today's NBA scoreboard as a polars DataFrame — no kwargs needed via parsed.*
from sportsdataverse.parsed.nba import espn_nba_scoreboard
df = espn_nba_scoreboard() # → polars
# Or via the original module with the return_parsed=True opt-in:
from sportsdataverse.nba import espn_nba_scoreboard
df = espn_nba_scoreboard(return_parsed=True)
print(df.select(["event_id", "home_name", "away_name",
"home_score", "away_score"]).head())
# Aaron Judge's 2024 season stats from the official MLB API
from sportsdataverse.mlb import mlb_api_person_stats, parse_mlb_api_person_stats
judge = parse_mlb_api_person_stats(
mlb_api_person_stats(person_id=592450, stats="season", season=2024)
)
print(judge.select(["stats_group", "stat_home_runs", "stat_avg"]))
# Connor McDavid's 2024-25 EDGE skating speed profile
from sportsdataverse.nhl import nhl_edge_skater_detail, parse_edge_detail
mcdavid = parse_edge_detail(nhl_edge_skater_detail(8478402))
print(mcdavid.select(["player_first_name_default", "top_shot_speed_metric"]))
Parser-backed wrappers return a polars DataFrame by default (0.0.54+);
pass return_parsed=False for the raw Dict. Compose with the
matching parse_* function for NHL / MLB sibling APIs. See
Polars / pandas parser layer below.
Supported leagues and data sources
| League | Module | Surfaces covered |
|---|---|---|
| NBA | sportsdataverse.nba | ESPN (Site v2 + Web v3 + Core v2) — 118 wrappers |
| WNBA | sportsdataverse.wnba | ESPN — 124 wrappers |
| MBB (NCAA M) | sportsdataverse.mbb | ESPN + NCAA-only (rankings, recruits) — 121 wrappers |
| WBB (NCAA W) | sportsdataverse.wbb | ESPN + NCAA-only — 126 wrappers |
| CFB | sportsdataverse.cfb | ESPN + NCAA + football-only (QBR) — 123 wrappers |
| NFL | sportsdataverse.nfl | ESPN + football-only (QBR) — 119 wrappers |
| MLB | sportsdataverse.mlb | ESPN + MLB Stats API (statsapi.mlb.com) + Baseball Savant / Statcast — 175 wrappers |
| NHL | sportsdataverse.nhl | api-web.nhle.com/v1/ (game-feed) + NHL EDGE (player tracking) + Stats REST + Records site — 132 wrappers |
| Total | ~1,030 wrappers |
Polars / pandas parser layer
Parser-backed wrappers return a polars DataFrame by default (0.0.54+);
pass return_parsed=False for the raw Dict. The parser layer in
sportsdataverse._common_espn_parsers
(plus matching modules for the MLB and NHL sibling APIs) turns those
payloads into tidy polars (or pandas) DataFrames.
For ESPN wrappers, return_parsed=True is now the default for
parser-backed endpoints. Pass return_parsed=False to recover the
raw Dict, or return_as_pandas=True to get a pandas DataFrame:
from sportsdataverse.nba import espn_nba_team_roster
df = espn_nba_team_roster(team_id=13) # → polars (default)
raw = espn_nba_team_roster(team_id=13, return_parsed=False) # → Dict
pdf = espn_nba_team_roster(team_id=13,
return_as_pandas=True) # → pandas
For NHL / MLB sibling-API wrappers, compose the wrapper with its parser:
from sportsdataverse.nhl import nhl_web_pbp, parse_nhl_web_pbp
df = parse_nhl_web_pbp(nhl_web_pbp(2023030417)) # 331-row polars frame
See the Architecture and Parsers pages for full details.
Installation
sportsdataverse-py can be installed via pip:
pip install sportsdataverse
or from the repo (which may at times be more up to date):
git clone https://github.com/sportsdataverse/sportsdataverse-py
cd sportsdataverse-py
pip install -e .
Our Authors
Citations
To cite the sportsdataverse-py Python package in publications, use:
BibTex Citation
@misc{gilani_sdvpy_2021,
author = {Gilani, Saiem},
title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
url = {https://py.sportsdataverse.org},
season = {2021}
}