Skip to main content
Version: main

๐Ÿ€ NBA hoops with sportsdataverse-py

Welcome to the hardwood! ๐ŸŽ‰ In just a few lines of Python you're about to pull a whole season of NBA data โ€” teams, standings, rosters, play-by-play, box scores, schedules and statistical leaders โ€” straight from ESPN and the SportsDataverse data releases. Everything comes back as a tidy polars DataFrame that's ready to slice, model, and chart. ๐Ÿš€

We lead with the richest surface in the package: the espn_nba_* family, backed by ESPN's site / web / core APIs. If you know the R package hoopR, these names will feel like home. Python neighbor for the raw NBA Stats endpoints: nba_api. Let's lace 'em up! ๐Ÿ‘Ÿ

๐Ÿงฐ The toolboxโ€‹

Here's the kit we'll reach for. The espn_nba_* wrappers (โญ our premium source) hit ESPN live and parse the JSON into polars for you; the load_nba_* loaders pull pre-built season parquets from the sportsdataverse-data releases โ€” fast and reliable. Click any name for the full reference.

FunctionWhat it gives youSource
espn_nba_teamsAll 30 NBA teams (grab team_ids here)โญ ESPN
espn_nba_scoreboardA day's slate โ€” scores, status, matchupsโญ ESPN
espn_nba_scheduleSchedule for a date / date-rangeโญ ESPN
espn_nba_standingsConference standings (W-L, win%, streak)โญ ESPN
espn_nba_team_rosterA team's active rosterโญ ESPN
espn_nba_team_scheduleOne team's full-season scheduleโญ ESPN
espn_nba_player_gamelogA player's game-by-game logโญ ESPN
espn_nba_leadersLeague statistical leadersโญ ESPN
espn_nba_pbpFull game payload (play-by-play, win prob, box)โญ ESPN
espn_nba_game_rostersBoth teams' rosters for one gameโญ ESPN
load_nba_scheduleMulti-season schedule parquet๐Ÿ“ฆ release
load_nba_player_boxscorePlayer box scores, every game๐Ÿ“ฆ release
load_nba_standingsHistorical standings๐Ÿ“ฆ release
espn_nba_injuriesLeague-wide injury report, one row per teamโญ ESPN
load_nba_team_boxscoreTeam box scores, every game (off/def, shooting)๐Ÿ“ฆ release
load_nba_shotsEvery made shot with court coordinates๐Ÿ“ฆ release
most_recent_nba_seasonThe current season year helper๐Ÿงฎ util

๐Ÿ”Œ Setupโ€‹

pip install sportsdataverse

No API key needed โ€” ESPN's public endpoints and the data releases are open. ๐Ÿ˜Š

import polars as pl
import sportsdataverse as sdv
from sportsdataverse.nba import most_recent_nba_season

pl.Config.set_tbl_rows(8)
SEASON = most_recent_nba_season()
print('current NBA season:', SEASON)

ESPN endpoints are live and seasonal, so we'll route every network call through a tiny safe() helper. When the feed is up you get the frame; when it's mid-offseason or briefly rate-limited you get a friendly one-liner instead of a scary traceback. ๐Ÿ›Ÿ

def safe(label, thunk):
try:
out = thunk()
n = out.height if isinstance(out, pl.DataFrame) else (len(out) if hasattr(out, '__len__') else '?')
print(f'โœ… {label} โ€” {n} rows')
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f'โญ๏ธ {label}: unavailable right now ({type(e).__name__})')
return None

๐ŸŸ๏ธ Teamsโ€‹

Start with espn_nba_teams โ€” one wide row per franchise. The team_id column is the key you'll pass into roster, schedule and standings calls everywhere else.

teams = safe('teams', sdv.nba.espn_nba_teams)
(teams.select(['team_id', 'team_location', 'team_name', 'team_abbreviation', 'team_color']).head(10)
if teams is not None else 'teams feed unavailable')

๐Ÿ“… Today on the slate (scoreboard)โ€‹

espn_nba_scoreboard returns a tidy frame of every game for a date โ€” final scores, live status, and matchups. Pass dates='YYYYMMDD' for one day. Here's a slice of the 2024 NBA Finals opener.

sb = safe('scoreboard', lambda: sdv.nba.espn_nba_scoreboard(dates='20240606'))
keep = ['game_id', 'short_name', 'home_abbreviation', 'away_abbreviation',
'home_score', 'away_score', 'status_type_detail']
(sb.select([c for c in keep if c in sb.columns]).head()
if sb is not None and sb.height else 'no games on that date')

๐Ÿ† Standingsโ€‹

espn_nba_standings gives one row per team with wins, losses, win%, point differential and current streak. Pass season= (the end year of the season).

standings = safe('standings', lambda: sdv.nba.espn_nba_standings(season=SEASON))
cols = ['team_display_name', 'wins', 'losses', 'win_percent', 'games_behind',
'point_differential', 'streak']
(standings.select([c for c in cols if c in standings.columns])
.sort('win_percent', descending=True).head(10)
if standings is not None and standings.height else 'standings unavailable')

๐Ÿณ Cookbook: common NBA tasksโ€‹

Now the fun part โ€” a handful of recipes you'll reach for again and again. Each one leans on the premium espn_nba_* wrappers.

Recipe 1 โ€” A team and its roster ๐Ÿ‘ฅโ€‹

Grab a team_id from espn_nba_teams, then pull the active roster with espn_nba_team_roster.

tid = None
if teams is not None and teams.height:
# Boston Celtics if present, else the first team
row = teams.filter(pl.col('team_abbreviation') == 'BOS')
tid = int((row if row.height else teams)['team_id'][0])

roster = safe(f'roster {tid}', lambda: sdv.nba.espn_nba_team_roster(team_id=tid)) if tid else None
cols = ['full_name', 'jersey', 'position_abbreviation', 'height', 'weight', 'age']
(roster.select([c for c in cols if c in roster.columns]).head(10)
if roster is not None and roster.height else 'roster unavailable')

Recipe 2 โ€” One team's season schedule ๐Ÿ“†โ€‹

espn_nba_team_schedule returns every game on a team's calendar for a season โ€” perfect for building a results table or a strength-of-schedule view.

tsched = safe(f'team schedule {tid}',
lambda: sdv.nba.espn_nba_team_schedule(team_id=tid, season=SEASON)) if tid else None
cols = ['id', 'date', 'name', 'short_name', 'season_year']
(tsched.select([c for c in cols if c in tsched.columns]).head()
if tsched is not None and tsched.height else 'team schedule unavailable')

Recipe 3 โ€” A player's game log โ›น๏ธโ€‹

espn_nba_player_gamelog returns a game-by-game stat line for one athlete. The stat_* columns are positional (the ordered ESPN box categories); pair them with the opponent and result columns to see how the night went. (1966 = LeBron James.)

gamelog = safe('LeBron gamelog',
lambda: sdv.nba.espn_nba_player_gamelog(athlete_id=1966, season=SEASON))
cols = ['event_date', 'opponent_abbreviation', 'home_away', 'game_result', 'score',
'stat_0', 'stat_1', 'stat_2']
(gamelog.select([c for c in cols if c in gamelog.columns]).head()
if gamelog is not None and gamelog.height else 'gamelog unavailable')

Recipe 4 โ€” Top scorers from the box-score release ๐Ÿฅ‡โ€‹

For a whole-season leaderboard the load_nba_player_boxscore release is your friend โ€” it's a fast parquet download, no live API needed. Here we average points per game and rank the top 10 scorers.

box = safe('player boxscore release', lambda: sdv.nba.load_nba_player_boxscore(seasons=[SEASON]))
if box is not None and box.height:
leaders = (
box.filter(pl.col('minutes') > 0)
.group_by(['athlete_display_name', 'team_abbreviation'])
.agg(pl.len().alias('gp'),
pl.col('points').mean().round(1).alias('ppg'),
pl.col('rebounds').mean().round(1).alias('rpg'),
pl.col('assists').mean().round(1).alias('apg'))
.filter(pl.col('gp') >= 20)
.sort('ppg', descending=True)
.head(10)
)
out = leaders
else:
out = 'box-score release unavailable'
out

Recipe 5 โ€” Offense vs defense, every team ๐Ÿ›ก๏ธโ€‹

The load_nba_team_boxscore release has one row per team-game with both team_score and opponent_team_score โ€” so points-for, points-against and net rating are a single group_by away.

tbox = safe('team boxscore release', lambda: sdv.nba.load_nba_team_boxscore(seasons=[SEASON]))
if tbox is not None and tbox.height:
netrtg = (
tbox.group_by('team_abbreviation')
.agg(pl.len().alias('gp'),
pl.col('team_score').mean().round(1).alias('off_ppg'),
pl.col('opponent_team_score').mean().round(1).alias('def_ppg'))
.with_columns((pl.col('off_ppg') - pl.col('def_ppg')).round(1).alias('net'))
.sort('net', descending=True)
.head(10)
)
out = netrtg
else:
out = 'team box-score release unavailable'
out

Recipe 6 โ€” Who lived behind the arc? ๐ŸŽฏโ€‹

Sum makes and attempts across the season to get each team's true three-point percentage (game-level percentages can't just be averaged). Reuses the tbox frame from Recipe 5 โ€” no second download.

if tbox is not None and tbox.height:
three_pt = (
tbox.group_by('team_abbreviation')
.agg(pl.col('three_point_field_goals_made').sum().alias('made'),
pl.col('three_point_field_goals_attempted').sum().alias('att'))
.with_columns((100 * pl.col('made') / pl.col('att')).round(1).alias('three_pt_pct'))
.filter(pl.col('att') > 0)
.sort('three_pt_pct', descending=True)
.head(10)
)
out = three_pt
else:
out = 'team box-score release unavailable'
out

Recipe 7 โ€” Double-double machines ๐Ÿ’ชโ€‹

A double-double is double digits in two of points / rebounds / assists. Count the categories per player-game, keep the ones that cleared two, then tally them up โ€” straight from load_nba_player_boxscore.

pbox = safe('player boxscore release', lambda: sdv.nba.load_nba_player_boxscore(seasons=[SEASON]))
if pbox is not None and pbox.height:
dd = (
pbox.filter(pl.col('minutes') > 0)
.with_columns(
((pl.col('points') >= 10).cast(pl.Int8)
+ (pl.col('rebounds') >= 10).cast(pl.Int8)
+ (pl.col('assists') >= 10).cast(pl.Int8)).alias('cats10'))
.filter(pl.col('cats10') >= 2)
.group_by(['athlete_display_name', 'team_abbreviation'])
.agg(pl.len().alias('double_doubles'))
.sort('double_doubles', descending=True)
.head(10)
)
out = dd
else:
out = 'player box-score release unavailable'
out

Recipe 8 โ€” A tidy standings table ๐Ÿ†โ€‹

The load_nba_standings release ships in long format (one row per team ร— stat). Pivot the stats you care about into columns to get a classic standings grid, sorted by win percentage.

stload = safe('standings release', lambda: sdv.nba.load_nba_standings(seasons=[SEASON]))
wanted = ['wins', 'losses', 'winPercent', 'playoffSeed', 'pointDifferential']
if stload is not None and stload.height and {'stat_name', 'value'}.issubset(stload.columns):
table = (
stload.filter(pl.col('stat_name').is_in(wanted))
.select(['team_abbreviation', 'group_name', 'stat_name', 'value'])
.pivot(values='value', index=['team_abbreviation', 'group_name'], on='stat_name')
.sort('winPercent', descending=True)
.head(12)
)
out = table
else:
out = 'standings release unavailable'
out

Recipe 9 โ€” Built on threes (shot release + a join) ๐Ÿงฑโ€‹

load_nba_shots is one row per made shot with a score_value. Tally points from twos vs threes per team, then join team abbreviations from the box-score release to find who leaned hardest on the long ball.

shots = safe('shots release', lambda: sdv.nba.load_nba_shots(seasons=[SEASON]))
if shots is not None and shots.height and tbox is not None and tbox.height:
fg = shots.filter(pl.col('score_value').is_in([2, 3]))
reliance = (
fg.group_by('team_id')
.agg(pl.col('score_value').filter(pl.col('score_value') == 3).len().alias('threes_made'),
pl.col('score_value').sum().alias('points_from_fg'))
.with_columns((3 * pl.col('threes_made')).alias('points_from_threes'))
.with_columns((100 * pl.col('points_from_threes') / pl.col('points_from_fg'))
.round(1).alias('pct_pts_from_3'))
.filter(pl.col('threes_made') >= 500) # drop All-Star / special rosters
)
abbr = tbox.select(['team_id', 'team_abbreviation']).unique()
out = (reliance.join(abbr, on='team_id', how='left')
.select(['team_abbreviation', 'threes_made', 'pct_pts_from_3'])
.sort('pct_pts_from_3', descending=True).head(10))
else:
out = 'shots / team box-score release unavailable'
out

Recipe 10 โ€” Head-to-head, game by game ๐Ÿคโ€‹

Filter the team box-score release to one matchup and you get the full season series โ€” every meeting, the score, and who won. Swap the two abbreviations for any rivalry you like.

TEAM_A, TEAM_B = 'BOS', 'NY'
if tbox is not None and tbox.height and 'opponent_team_abbreviation' in tbox.columns:
series = (
tbox.filter((pl.col('team_abbreviation') == TEAM_A)
& (pl.col('opponent_team_abbreviation') == TEAM_B))
.select([c for c in ['game_date', 'team_home_away', 'team_score',
'opponent_team_score', 'team_winner']
if c in tbox.columns])
.sort('game_date')
)
out = series if series.height else f'no {TEAM_A} vs {TEAM_B} games in {SEASON}'
else:
out = 'team box-score release unavailable'
out

Recipe 11 โ€” Who's banged up? ๐Ÿฉน (pandas interop)โ€‹

espn_nba_injuries hits ESPN live for the league-wide injury report. Ask for a pandas frame with return_as_pandas=True (a handy interop point), count the listed players per team, then hand the result back to polars for the final sort.

import ast

inj = safe('injuries', lambda: sdv.nba.espn_nba_injuries(return_as_pandas=True))
if inj is not None and getattr(inj, 'shape', (0,))[0] and 'injuries' in inj.columns:
def _n_listed(s):
try:
v = ast.literal_eval(s) if isinstance(s, str) else s
return len(v) if isinstance(v, list) else 0
except Exception:
return 0
inj = inj.copy()
inj['players_listed'] = inj['injuries'].apply(_n_listed)
out = (pl.from_pandas(inj[['display_name', 'players_listed']])
.filter(pl.col('players_listed') > 0)
.sort('players_listed', descending=True)
.head(12))
else:
out = 'injury report unavailable (off-season or feed down)'
out

๐ŸŽฌ Play-by-play & game rostersโ€‹

Now for the granular stuff. espn_nba_pbp returns the whole game payload as a dict โ€” play-by-play, win probability, box score, and header โ€” keyed by game_id (an ESPN event id). Pair it with espn_nba_game_rosters for who actually suited up.

We'll use Game 1 of the 2024 Finals (game_id=401585660).

GAME_ID = 401585660
pbp = safe('pbp payload', lambda: sdv.nba.espn_nba_pbp(game_id=GAME_ID))
(list(pbp.keys())[:8] if isinstance(pbp, dict) else 'pbp unavailable')
plays = (pl.DataFrame(pbp['plays'], infer_schema_length=None)
if isinstance(pbp, dict) and pbp.get('plays') else None)
cols = ['period.number', 'clock.displayValue', 'text', 'homeScore', 'awayScore', 'scoringPlay']
(plays.select([c for c in cols if c in plays.columns]).head()
if plays is not None and plays.height else 'no plays parsed')

Slice it: every 3-pointer in the game ๐ŸŽฏโ€‹

The plays frame is just polars โ€” so a scoring slice is one filter away. Here we pull made three-pointers in chronological order.

if plays is not None and plays.height:
threes = (
plays.filter(pl.col('scoringPlay') == True)
.filter(pl.col('text').str.contains('(?i)three point|3pt|three-point'))
.select([c for c in ['period.number', 'clock.displayValue', 'text',
'homeScore', 'awayScore'] if c in plays.columns])
)
out = threes.head(10) if threes.height else 'no three-pointers matched the text filter'
else:
out = 'no plays to slice'
out

Who played? Game rosters ๐Ÿ“‹โ€‹

espn_nba_game_rosters returns both teams' rosters for a single game, one row per athlete โ€” including the starter flag and jersey number.

grosters = safe('game rosters', lambda: sdv.nba.espn_nba_game_rosters(game_id=GAME_ID))
cols = ['athlete_display_name', 'team_abbreviation', 'starter', 'jersey', 'position_name']
(grosters.select([c for c in cols if c in grosters.columns]).head(10)
if grosters is not None and grosters.height else 'game rosters unavailable')

๐Ÿ“ฆ Bulk season data with the loadersโ€‹

When you want everything for a season at once โ€” not one game at a time โ€” the load_nba_* loaders pull pre-built parquet releases. They're fast, reliable, and don't depend on a live API being up.

LoaderGrain
load_nba_scheduleone row per game
load_nba_player_boxscoreone row per player-game
load_nba_standingsone row per team-season
sched = safe('schedule release', lambda: sdv.nba.load_nba_schedule(seasons=[SEASON]))
cols = ['id', 'date', 'home_display_name', 'away_display_name', 'home_score', 'away_score']
(sched.select([c for c in cols if c in sched.columns]).head()
if sched is not None and sched.height else 'schedule release unavailable')

Pipeline: the highest-scoring games of the season ๐Ÿ”ฅโ€‹

With the schedule release in hand, a combined-points leaderboard is a quick polars pipeline โ€” cast the scores, sum them, sort descending.

if sched is not None and sched.height and {'home_score', 'away_score'}.issubset(sched.columns):
hot = (
sched.with_columns(
(pl.col('home_score').cast(pl.Int64, strict=False)
+ pl.col('away_score').cast(pl.Int64, strict=False)).alias('total_points')
)
.filter(pl.col('total_points').is_not_null())
.sort('total_points', descending=True)
.select([c for c in ['date', 'home_display_name', 'away_display_name',
'home_score', 'away_score', 'total_points'] if c in sched.columns])
.head(10)
)
out = hot
else:
out = 'schedule release unavailable'
out

๐ŸŽ‰ Where to nextโ€‹

You just toured the premium espn_nba_* surface plus the season loaders โ€” teams, scoreboard, standings, rosters, schedules, player game logs, play-by-play, and bulk box scores. A few parting tips:

  • Pass return_as_pandas=True to any wrapper for a pandas frame instead of polars.
  • ESPN espn_nba_* wrappers also accept return_parsed=False for the raw JSON dict.
  • Full reference lives in the NBA section of the sidebar: ESPN site API ยท ESPN web API ยท ESPN core API ยท additional functions ยท loaders
  • R user? The same surface lives in hoopR.
  • Need raw NBA Stats endpoints? See nba_api.

Now go break down some film โ€” and may your jumper always find the bottom of the net! ๐Ÿ€๐Ÿ”ฅ