Skip to main content
Version: 0.0.54

sportsdataverse-py

Lifecycle PyPI Contributors Twitter
Follow

See CHANGELOG.md for details.

sportsdataverse-py gives the community free, tidy, analysis-ready sports data in Python. It is the Python member of the SportsDataverse family and deliberately mirrors its R sisters — hoopR (NBA/MBB), wehoop (WNBA/WBB), cfbfastR (CFB), baseballr (MLB), and fastRhockey (NHL/PWHL) — so the function you know in R is the function you call in Python. The NFL module mirrors the nflverse's nflreadpy, and the package plays well with the wider PySport ecosystem. Beyond aggregation and tidying, the project also exists to make open-source expected-points and win-probability models reproducible and benchmarkable, especially for American football.

New here? Read Ecosystem & philosophy for the design philosophy, the full function-naming paradigm, and how the Python and R packages line up.

Quickstart

pip install sportsdataverse
# Today's NBA scoreboard as a polars DataFrame — no kwargs needed via parsed.*
from sportsdataverse.parsed.nba import espn_nba_scoreboard
df = espn_nba_scoreboard() # → polars

# Or via the original module with the return_parsed=True opt-in:
from sportsdataverse.nba import espn_nba_scoreboard
df = espn_nba_scoreboard(return_parsed=True)
print(df.select(["event_id", "home_name", "away_name",
"home_score", "away_score"]).head())

# Aaron Judge's 2024 season stats from the official MLB API
from sportsdataverse.mlb import mlb_api_person_stats, parse_mlb_api_person_stats
judge = parse_mlb_api_person_stats(
mlb_api_person_stats(person_id=592450, stats="season", season=2024)
)
print(judge.select(["stats_group", "stat_home_runs", "stat_avg"]))

# Connor McDavid's 2024-25 EDGE skating speed profile
from sportsdataverse.nhl import nhl_edge_skater_detail, parse_edge_detail
mcdavid = parse_edge_detail(nhl_edge_skater_detail(8478402))
print(mcdavid.select(["player_first_name_default", "top_shot_speed_metric"]))

Parser-backed wrappers return a polars DataFrame by default (0.0.54+); pass return_parsed=False for the raw Dict. Compose with the matching parse_* function for NHL / MLB sibling APIs. See Polars / pandas parser layer below.

Supported leagues and data sources

LeagueModuleSurfaces covered
NBAsportsdataverse.nbaESPN (Site v2 + Web v3 + Core v2) — 118 wrappers
WNBAsportsdataverse.wnbaESPN — 124 wrappers
MBB (NCAA M)sportsdataverse.mbbESPN + NCAA-only (rankings, recruits) — 121 wrappers
WBB (NCAA W)sportsdataverse.wbbESPN + NCAA-only — 126 wrappers
CFBsportsdataverse.cfbESPN + NCAA + football-only (QBR) — 123 wrappers
NFLsportsdataverse.nflESPN + football-only (QBR) — 119 wrappers
MLBsportsdataverse.mlbESPN + MLB Stats API (statsapi.mlb.com) + Baseball Savant / Statcast — 175 wrappers
NHLsportsdataverse.nhlapi-web.nhle.com/v1/ (game-feed) + NHL EDGE (player tracking) + Stats REST + Records site — 132 wrappers
Total~1,030 wrappers

Polars / pandas parser layer

Parser-backed wrappers return a polars DataFrame by default (0.0.54+); pass return_parsed=False for the raw Dict. The parser layer in sportsdataverse._common_espn_parsers (plus matching modules for the MLB and NHL sibling APIs) turns those payloads into tidy polars (or pandas) DataFrames.

For ESPN wrappers, return_parsed=True is now the default for parser-backed endpoints. Pass return_parsed=False to recover the raw Dict, or return_as_pandas=True to get a pandas DataFrame:

from sportsdataverse.nba import espn_nba_team_roster

df = espn_nba_team_roster(team_id=13) # → polars (default)
raw = espn_nba_team_roster(team_id=13, return_parsed=False) # → Dict
pdf = espn_nba_team_roster(team_id=13,
return_as_pandas=True) # → pandas

For NHL / MLB sibling-API wrappers, compose the wrapper with its parser:

from sportsdataverse.nhl import nhl_web_pbp, parse_nhl_web_pbp
df = parse_nhl_web_pbp(nhl_web_pbp(2023030417)) # 331-row polars frame

See the Architecture and Parsers pages for full details.

Installation

sportsdataverse-py can be installed via pip:

pip install sportsdataverse

or from the repo (which may at times be more up to date):

git clone https://github.com/sportsdataverse/sportsdataverse-py
cd sportsdataverse-py
pip install -e .

Our Authors

Citations

To cite the sportsdataverse-py Python package in publications, use:

BibTex Citation

@misc{gilani_sdvpy_2021,
author = {Gilani, Saiem},
title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
url = {https://py.sportsdataverse.org},
season = {2021}
}