Skip to main content
Version: main

๐Ÿ€ Men's college basketball with sportsdataverse-py

Welcome to Selection-Sunday-grade hoops data! ๐ŸŽ‰ In a handful of lines of Python you're about to pull NCAA Division I men's basketball โ€” full schedules, play-by-play, standings, rosters, statistical leaders and multi-season parquet archives โ€” and get it all back as tidy polars DataFrames ready to model.

sportsdataverse.mbb leads with two premium sources:

  • ๐ŸŸฅ ESPN (espn_mbb_*) โ€” the site + core APIs behind ESPN.com: live scoreboards, schedules, standings, rankings, box scores, win probability and play-by-play.
  • ๐ŸฆŠ FoxSports (fox_mbb_*) โ€” FoxSports' league-leader, standings, roster, boxscore and odds feeds.

Plus ๐Ÿ“ฆ release loaders (load_mbb_*) that hand you whole seasons of play-by-play, box scores, shots and schedules from the data repo in one call.

R user? The men's-basketball companion is hoopR (NBA + NCAA). Let's tip off! ๐Ÿ€

๐Ÿงฐ The toolboxโ€‹

Every accessor returns a tidy polars DataFrame by default โ€” pass return_as_pandas=True for pandas. The richest live surfaces are ESPN and Fox; the load_* loaders read pre-built parquet from the data release (rock-solid, no live API). Click any name for the full reference.

FunctionWhat it gives youSource
espn_mbb_teamsEvery D-I team (grab team_ids)๐ŸŸฅ ESPN โญ
espn_mbb_scheduleGames + results for a date / window๐ŸŸฅ ESPN โญ
espn_mbb_scoreboardRich scoreboard for a date (status, lines, odds)๐ŸŸฅ ESPN โญ
espn_mbb_standingsConference standings, one row per team๐ŸŸฅ ESPN โญ
espn_mbb_rankingsAP / Coaches poll (in-season)๐ŸŸฅ ESPN โญ
espn_mbb_summaryFull game summary: box, plays, win prob๐ŸŸฅ ESPN โญ
espn_mbb_team_rosterA team's roster๐ŸŸฅ ESPN โญ
espn_mbb_pbpEvent-level play-by-play for a game๐ŸŸฅ ESPN โญ
espn_mbb_game_rostersWho dressed + started for one game๐ŸŸฅ ESPN โญ
espn_mbb_player_statsA player's season stat line๐ŸŸฅ ESPN โญ
fox_mbb_league_leadersStat leaders (scoring, rebounds, โ€ฆ)๐ŸฆŠ Fox โญ
fox_mbb_standingsFox conference standings for a team๐ŸฆŠ Fox โญ
fox_mbb_team_rosterFox roster for a team๐ŸฆŠ Fox
espn_mbb_team_scheduleOne team's full season schedule๐ŸŸฅ ESPN โญ
espn_mbb_conferencesConference / group catalog๐ŸŸฅ ESPN โญ
load_mbb_scheduleWhole-season schedule parquet๐Ÿ“ฆ loader
load_mbb_player_boxscoreSeason player box scores๐Ÿ“ฆ loader
load_mbb_team_boxscoreSeason team box scores๐Ÿ“ฆ loader
load_mbb_pbpSeason play-by-play parquet๐Ÿ“ฆ loader
most_recent_mbb_seasonCurrent season-year helper๐Ÿ› ๏ธ helper

โญ = premium live source.

๐Ÿ”Œ Setupโ€‹

pip install sportsdataverse

No API key needed โ€” ESPN, Fox and the parquet loaders are all open. ๐Ÿ˜Š

import polars as pl
import sportsdataverse as sdv

pl.Config.set_tbl_rows(10)
print("most recent MBB season:", sdv.mbb.most_recent_mbb_season())

ESPN's live endpoints (scoreboard, rankings, standings, a single game's play-by-play) are seasonal โ€” in the offseason a poll or scoreboard can come back empty. So we use a tiny safe() helper: you get the frame when the feed is up, and a friendly one-liner when it isn't โ€” never a scary traceback. ๐Ÿ›Ÿ The load_* parquet loaders are stable year-round, so we call those directly.

def safe(label, thunk):
"""Run a live call defensively; return its result or None with a note."""
try:
out = thunk()
ok = out is not None and (not hasattr(out, "height") or out.height)
print(f"{'โœ…' if ok else 'โ„น๏ธ '} {label}{'' if ok else ' โ€” no rows right now'}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"โญ๏ธ {label}: unavailable right now ({type(e).__name__})")
return None

๐ŸŸ๏ธ Every team in Division Iโ€‹

Start with espn_mbb_teams โ€” one row per program, with the team_id you'll pass into roster, schedule and summary calls. This is a plain catalog fetch, so it's reliable year-round.

teams = sdv.mbb.espn_mbb_teams()
print("teams:", teams.shape)
teams.select(["team_id", "team_location", "team_name", "team_abbreviation", "team_is_active"]).head()

๐Ÿ“… Schedule & scores for a date windowโ€‹

espn_mbb_schedule takes a single dates=YYYYMMDD or a 'YYYYMMDD-YYYYMMDD' window and returns one row per game with final scores. Here's championship day of the 2024 tournament.

sched = safe(
"schedule 2024-04-08",
lambda: sdv.mbb.espn_mbb_schedule(dates=20240408),
)
(sched.select(["id", "home_display_name", "away_display_name", "home_score", "away_score"]).head()
if sched is not None and sched.height else "schedule unavailable")

๐Ÿ“Š The rich scoreboardโ€‹

espn_mbb_scoreboard is the deluxe version: for a given date it returns status, broadcast, betting lines and team line scores โ€” 50 columns wide. Defaults to polars; we peek at a tidy slice.

sb = safe(
"scoreboard 2024-04-08",
lambda: sdv.mbb.espn_mbb_scoreboard(dates=20240408, return_as_pandas=False),
)
if sb is not None and getattr(sb, "height", 0):
keep = ["game_id", "short_name", "status_type_description",
"home_team_short_display_name", "away_team_short_display_name"]
out = sb.select([c for c in keep if c in sb.columns]).head()
else:
out = "scoreboard empty right now (offseason)"
out

๐Ÿ† Conference standingsโ€‹

espn_mbb_standings returns one row per team for a season with wins, losses, win pct, point differential and conference grouping. Great for a quick power look across the league.

standings = safe(
"standings 2024",
lambda: sdv.mbb.espn_mbb_standings(season=2024, return_as_pandas=False),
)
if standings is not None and getattr(standings, "height", 0):
keep = ["team_display_name", "group_name", "wins", "losses",
"win_percent", "point_differential"]
out = (standings.select([c for c in keep if c in standings.columns])
.sort("win_percent", descending=True).head(10))
else:
out = "standings unavailable"
out

๐Ÿณ Cookbook: common MBB tasksโ€‹

Now the fun part โ€” real tasks you'll reach for constantly, each built on a premium ESPN or Fox wrapper. Every recipe is guarded so a transient or offseason hiccup prints a note instead of breaking the page.

Recipe 1 โ€” National scoring leaders ๐Ÿฅ‡ (FoxSports)โ€‹

fox_mbb_league_leaders serves the leaderboard direct from FoxSports โ€” pick a category (scoring, rebounds, assists, โ€ฆ) and who (player or team). No IDs needed.

leaders = safe(
"fox scoring leaders",
lambda: sdv.mbb.fox_mbb_league_leaders(category="scoring", who="player"),
)
if leaders is not None and getattr(leaders, "height", 0):
keep = ["players", "gp", "mpg", "ppg", "pts"]
out = leaders.select([c for c in keep if c in leaders.columns]).head(10)
else:
out = "Fox leaders unavailable right now"
out

Recipe 2 โ€” Look up a team's roster ๐Ÿ‘ฅ (ESPN)โ€‹

Grab a team_id from espn_mbb_teams, then espn_mbb_team_roster returns the current roster. Here we resolve UConn (the 2024 champs) by abbreviation so the recipe is self-contained.

row = teams.filter(pl.col("team_abbreviation") == "CONN")
tid = int(row["team_id"][0]) if row.height else 41 # 41 = UConn fallback
roster = safe(
f"roster team_id={tid}",
lambda: sdv.mbb.espn_mbb_team_roster(team_id=tid, return_as_pandas=False),
)
if roster is not None and getattr(roster, "height", 0):
keep = ["full_name", "jersey", "display_height", "display_weight"]
out = roster.select([c for c in keep if c in roster.columns]).head(12)
else:
out = "roster unavailable right now"
out

Recipe 3 โ€” Season scoring leaderboard from parquet ๐Ÿ“ฆโ€‹

The load_* loaders pull whole seasons from the data release โ€” perfect for analysis that shouldn't depend on a live endpoint. load_mbb_player_boxscore gives every player-game; we aggregate to a per-player points-per-game board.

pbox = sdv.mbb.load_mbb_player_boxscore(seasons=[2024])
print("player box rows:", pbox.shape)
(pbox
.filter(pl.col("points").is_not_null())
.group_by(["athlete_display_name", "team_short_display_name"])
.agg(
pl.len().alias("g"),
pl.col("points").cast(pl.Float64, strict=False).mean().round(1).alias("ppg"),
)
.filter(pl.col("g") >= 20)
.sort("ppg", descending=True)
.head(10))

Recipe 4 โ€” Play-by-play slice for one game ๐ŸŽฌ (ESPN)โ€‹

espn_mbb_pbp returns a dict; its plays list is event-level. We frame it and pull just the scoring plays of the 2024 national championship (UConn vs. Purdue, game_id=401638636).

pbp = safe("pbp 401638636", lambda: sdv.mbb.espn_mbb_pbp(game_id=401638636))
if isinstance(pbp, dict) and pbp.get("plays"):
plays = pl.DataFrame(pbp["plays"], infer_schema_length=None)
keep = ["period.number", "clock.displayValue", "text", "scoringPlay",
"homeScore", "awayScore"]
out = (plays.select([c for c in keep if c in plays.columns])
.filter(pl.col("scoringPlay") == True) # noqa: E712
.head(10))
else:
out = "play-by-play unavailable right now"
out

Recipe 5 โ€” Best net scoring margin ๐Ÿ“Š (parquet)โ€‹

load_mbb_team_boxscore gives one row per team-game with the opponent's score attached, so a single group-by ranks every program by points scored minus points allowed โ€” the cleanest one-number power proxy. Pure parquet, no live endpoint.

tbox = sdv.mbb.load_mbb_team_boxscore(seasons=[2024])
print("team box rows:", tbox.shape)
(tbox
.group_by("team_display_name")
.agg(
pl.len().alias("g"),
pl.col("team_score").cast(pl.Float64, strict=False).mean().round(1).alias("ppg"),
pl.col("opponent_team_score").cast(pl.Float64, strict=False).mean().round(1).alias("opp_ppg"),
)
.with_columns((pl.col("ppg") - pl.col("opp_ppg")).round(1).alias("net_margin"))
.filter(pl.col("g") >= 25)
.sort("net_margin", descending=True)
.head(10))

Recipe 6 โ€” Best 3-point shooting teams ๐ŸŽฏ (parquet)โ€‹

Same team-box parquet, different question: sum makes and attempts across the season, then divide. A min attempts filter keeps small-sample flukes off the board so the leaders are real volume shooters.

(tbox
.group_by("team_display_name")
.agg(
pl.col("three_point_field_goals_made")
.cast(pl.Float64, strict=False).sum().alias("tpm"),
pl.col("three_point_field_goals_attempted")
.cast(pl.Float64, strict=False).sum().alias("tpa"),
)
.with_columns((pl.col("tpm") / pl.col("tpa") * 100).round(1).alias("three_pct"))
.filter(pl.col("tpa") >= 500)
.sort("three_pct", descending=True)
.select(["team_display_name", "tpm", "tpa", "three_pct"])
.head(10))

Recipe 7 โ€” Most efficient scorers โšก (true shooting %)โ€‹

Points-per-game rewards volume; true shooting % rewards efficiency โ€” it folds threes and free throws into one rate via TS% = PTS / (2 ยท (FGA + 0.44ยทFTA)). We compute it straight from load_mbb_player_boxscore, keeping only high-usage scorers.

pbox = sdv.mbb.load_mbb_player_boxscore(seasons=[2024])
(pbox
.filter(pl.col("points").is_not_null())
.group_by(["athlete_display_name", "team_abbreviation"])
.agg(
pl.len().alias("g"),
pl.col("points").cast(pl.Float64, strict=False).sum().alias("pts"),
pl.col("field_goals_attempted").cast(pl.Float64, strict=False).sum().alias("fga"),
pl.col("free_throws_attempted").cast(pl.Float64, strict=False).sum().alias("fta"),
)
.with_columns(
(pl.col("pts") / (2 * (pl.col("fga") + 0.44 * pl.col("fta"))) * 100)
.round(1).alias("ts_pct"))
.filter((pl.col("g") >= 25) & (pl.col("pts") >= 400))
.sort("ts_pct", descending=True)
.select(["athlete_display_name", "team_abbreviation", "g", "pts", "ts_pct"])
.head(10))

Recipe 8 โ€” One conference's power board ๐ŸŸ๏ธ (ESPN, join)โ€‹

espn_mbb_conferences is the group catalog; espn_mbb_standings carries a group_name per team. Filter standings to a single league โ€” here the Big 12 โ€” to get a clean intra-conference pecking order.

confs = safe("conferences", lambda: sdv.mbb.espn_mbb_conferences())
if confs is not None and getattr(confs, "height", 0):
print("some conferences:",
confs.filter(pl.col("is_conference"))["name"].to_list()[:8])
st = safe("standings 2024", lambda: sdv.mbb.espn_mbb_standings(season=2024))
if st is not None and getattr(st, "height", 0) and "group_name" in st.columns:
keep = ["team_display_name", "wins", "losses", "win_percent", "point_differential"]
out = (st.filter(pl.col("group_name").str.contains("Big 12"))
.select([c for c in keep if c in st.columns])
.sort("win_percent", descending=True)
.head(12))
out = out if out.height else st.select(
[c for c in keep if c in st.columns]).sort(
"win_percent", descending=True).head(12)
else:
out = "standings unavailable right now"
out

Recipe 9 โ€” A team's full season schedule ๐Ÿ—“๏ธ (ESPN)โ€‹

espn_mbb_team_schedule returns every game on one team's slate for a season โ€” matchup name, week and season type โ€” perfect for building an opponent list. We use UConn's 2024 championship run.

tid_sched = int(row["team_id"][0]) if row.height else 41 # UConn fallback
tsched = safe(
f"team schedule {tid_sched}",
lambda: sdv.mbb.espn_mbb_team_schedule(team_id=tid_sched, season=2024),
)
if tsched is not None and getattr(tsched, "height", 0):
keep = ["id", "short_name", "season_type_name", "week_text"]
out = tsched.select([c for c in keep if c in tsched.columns]).head(12)
else:
out = "team schedule unavailable right now"
out

Recipe 10 โ€” Top rebounding teams ๐Ÿงฒ (FoxSports)โ€‹

fox_mbb_league_leaders isn't just a player board โ€” flip who="team" and pick category="rebounds" to rank programs on the glass straight from FoxSports. No IDs needed.

team_reb = safe(
"fox team rebounds",
lambda: sdv.mbb.fox_mbb_league_leaders(category="rebounds", who="team"),
)
if team_reb is not None and getattr(team_reb, "height", 0):
keep = ["teams", "gp", "w", "l", "ppg", "ppg_diff"]
out = team_reb.select([c for c in keep if c in team_reb.columns]).head(10)
else:
out = "Fox team leaders unavailable right now"
out

Recipe 11 โ€” Crunch-time buckets ๐Ÿ”ฅ (parquet PBP)โ€‹

load_mbb_pbp is the whole season's play-by-play in one parquet โ€” no live game needed. We slice it to scoring plays in the final minute of the second half: every late-game dagger across the year.

season_pbp = sdv.mbb.load_mbb_pbp(seasons=[2024])
print("season pbp rows:", season_pbp.shape)
(season_pbp
.filter(
(pl.col("scoring_play") == True) # noqa: E712
& (pl.col("period_number") >= 2)
& (pl.col("end_period_seconds_remaining").cast(pl.Float64, strict=False) <= 60)
)
.select(["game_id", "period_display_value", "clock_display_value",
"text", "home_score", "away_score"])
.head(10))

Recipe 12 โ€” Double-double leaders ๐Ÿผ (pandas interop)โ€‹

Prefer pandas? Pass return_as_pandas=True to any loader and stay in your comfort zone. Here we count games where a player hit double digits in at least two of points / rebounds / assists โ€” the classic double-double โ€” entirely in pandas.

import pandas as pd

pbox_pd = sdv.mbb.load_mbb_player_boxscore(seasons=[2024], return_as_pandas=True)
for col in ["points", "rebounds", "assists"]:
pbox_pd[col] = pd.to_numeric(pbox_pd[col], errors="coerce")
pbox_pd["is_dd"] = (pbox_pd[["points", "rebounds", "assists"]] >= 10).sum(axis=1) >= 2
(pbox_pd[pbox_pd["is_dd"]]
.groupby(["athlete_display_name", "team_abbreviation"])
.size()
.reset_index(name="double_doubles")
.sort_values("double_doubles", ascending=False)
.head(10)
.reset_index(drop=True))

๐Ÿงพ One call, the whole game: espn_mbb_summaryโ€‹

espn_mbb_summary is the Swiss army knife โ€” a single event_id returns a dict with team & player box scores, play-by-play, win probability, leaders, officials and more. Let's grab the team box score from that 2024 title game.

summ = safe("summary 401638636", lambda: sdv.mbb.espn_mbb_summary(event_id=401638636))
if isinstance(summ, dict) and summ.get("boxscore_team") is not None:
tb = summ["boxscore_team"]
tb = tb if isinstance(tb, pl.DataFrame) else pl.DataFrame(tb)
print("box score sections available:", [k for k in summ.keys()][:8])
out = tb.head()
else:
out = "summary unavailable right now"
out

๐Ÿ™Œ Who suited up: game rostersโ€‹

espn_mbb_game_rosters returns one row per dressed player for a game, flagging starters โ€” handy for joining onto play-by-play or box scores.

gr = safe("game rosters 401638636", lambda: sdv.mbb.espn_mbb_game_rosters(game_id=401638636))
if gr is not None and getattr(gr, "height", 0):
keep = ["athlete_display_name", "team_abbreviation", "starter"]
out = gr.select([c for c in keep if c in gr.columns]).head(10)
else:
out = "game rosters unavailable right now"
out

๐Ÿ”ง A multi-season pipeline: highest-scoring tournament gamesโ€‹

The schedule loader is stable, so here's a pure-polars analysis with no live dependency. We load the 2024 season schedule and rank games by combined points โ€” March Madness shootouts float right to the top.

schedule_2024 = sdv.mbb.load_mbb_schedule(seasons=[2024])
print("season schedule rows:", schedule_2024.shape)
(schedule_2024
.with_columns(
(pl.col("home_score").cast(pl.Int64, strict=False)
+ pl.col("away_score").cast(pl.Int64, strict=False)).alias("total"))
.filter(pl.col("total").is_not_null())
.sort("total", descending=True)
.select(["game_date", "home_display_name", "away_display_name",
"home_score", "away_score", "total"])
.head(10))

๐ŸŽ‰ Where to nextโ€‹

  • ๐ŸŸฅ ESPN wrappers (espn_mbb_*) cover the live site + core APIs โ€” scoreboards, standings, rankings, summaries, play-by-play and more. See the additional and site reference pages.
  • ๐ŸฆŠ FoxSports wrappers (fox_mbb_*) โ€” leaders, standings, rosters, boxscores and odds in additional.
  • ๐Ÿ“ฆ Loaders (load_mbb_*) read whole seasons of parquet โ€” see loaders. Pass return_as_pandas=True anywhere for pandas instead of polars.
  • ๐Ÿ€ R user? The same surface lives in hoopR (NBA + NCAA men's basketball).
  • ๐Ÿšบ Women's hoops? Check out the WBB module and its companion wehoop.

Now go bracket something! ๐Ÿ€๐Ÿ”ฅ