π Women's basketball with sportsdataverse-py
Welcome! In just a few lines of Python you're about to pull WNBA teams, rosters, schedules, play-by-play, season stats, standings and the draft β all as tidy polars DataFrames that are ready to model. π
sportsdataverse.wnba leads with ESPN's rich public API (the espn_wnba_* family) and tops it off with load_wnba_* parquet loaders that hand you whole seasons in one shot. No API key needed. π
If you've used the R package wehoop, these names will feel right at home. Let's go hoop! π
π§° The toolboxβ
Every accessor returns a tidy polars DataFrame by default β pass return_as_pandas=True for pandas, or raw=True (where supported) for the untouched ESPN JSON. Here's the whole kit (click any name for the full reference):
| Function | What it gives you | Source |
|---|---|---|
espn_wnba_teams | One row per franchise (grab team_ids) | β ESPN |
espn_wnba_team_roster | A team's active roster for a season | β ESPN |
espn_wnba_schedule | Games + results for a date or date range | β ESPN |
espn_wnba_pbp | Event-level play-by-play for one game | β ESPN |
espn_wnba_player_stats | A player's season stat line (wide) | β ESPN |
espn_wnba_team_stats | A team's season stats (Averages/Totals/Misc) | β ESPN |
espn_wnba_standings | League standings, one row per team | β ESPN |
espn_wnba_draft | Every draft pick for a season | β ESPN |
espn_wnba_game_officials | The refs who worked a game | β ESPN |
load_wnba_schedule | Whole-season schedule (parquet release) | π¦ loader |
load_wnba_player_boxscore | Whole-season player box scores | π¦ loader |
load_wnba_team_boxscore | Whole-season team box scores | π¦ loader |
load_wnba_player_season_stats | Season-aggregated player stats | π¦ loader |
load_wnba_pbp | Whole-season play-by-play | π¦ loader |
load_wnba_shots | Shot-location data | π¦ loader |
load_wnba_standings | Whole-season standings (long) | π¦ loader |
load_wnba_rosters | Whole-season rosters | π¦ loader |
load_wnba_draft | Whole-season draft picks | π¦ loader |
most_recent_wnba_season | The latest season year | π οΈ helper |
β = the premium ESPN live API Β· π¦ = bulk parquet loaders Β· π οΈ = helpers.
π Setupβ
pip install sportsdataverse
That's it β the ESPN endpoints are public, so there's nothing to configure. π
import polars as pl
import sportsdataverse as sdv
import sportsdataverse.wnba as wnba
SEASON = 2024 # a complete season, so every cell has data to show
print('most recent WNBA season:', wnba.most_recent_wnba_season())
ESPN's live endpoints are seasonal and occasionally rate-limited, so a tiny safe() helper runs each risky call defensively β you get the frame when the feed is up, and a friendly one-liner when it isn't (never a scary traceback). The load_wnba_* loaders read static parquet releases and are rock-solid, so we let those run bare. π
def safe(label, thunk):
"""Run a live call; print a one-liner instead of raising on failure."""
try:
out = thunk()
print(f'β
{label}')
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f'βοΈ {label}: unavailable right now ({type(e).__name__})')
return None
ποΈ Teamsβ
espn_wnba_teams returns one row per franchise. The team_id, location, name and abbreviation are the keys you'll reuse to fetch rosters, schedules and stats.
teams = safe('WNBA teams', wnba.espn_wnba_teams)
print('shape:', None if teams is None else teams.shape)
(teams.select(['team_id', 'team_location', 'team_name',
'team_abbreviation', 'team_display_name']).head(15)
if teams is not None else 'teams unavailable')
π₯ Team roster β Las Vegas Acesβ
espn_wnba_team_roster lists active players for one team in a season. The back-to-back champion Aces are team_id=17. Player columns are unprefixed (athlete_id, full_name, jersey, position_abbreviation).
aces = safe('Aces roster', lambda: wnba.espn_wnba_team_roster(team_id=17, season=SEASON))
(aces.select(['athlete_id', 'full_name', 'jersey',
'position_abbreviation', 'display_height', 'age']).head(12)
if aces is not None else 'roster unavailable')
π Scheduleβ
espn_wnba_schedule takes dates=YYYYMMDD for a single day, or a 'YYYYMMDD-YYYYMMDD' string for a range. Team-name columns are home_display_name / away_display_name, and home_score / away_score come back as strings β cast before doing arithmetic.
The range below (Oct 16β20, 2024) is the back half of the 2024 WNBA Finals. Let's cast the scores and derive a winning margin to show a small polars transform.
finals = safe('2024 Finals schedule',
lambda: wnba.espn_wnba_schedule(dates='20241016-20241020'))
if finals is not None and finals.height:
out = (finals
.select(['id', 'home_display_name', 'away_display_name',
'home_score', 'away_score', 'status_type_description'])
.with_columns([
pl.col('home_score').cast(pl.Int64, strict=False).alias('home_pts'),
pl.col('away_score').cast(pl.Int64, strict=False).alias('away_pts'),
])
.with_columns((pl.col('home_pts') - pl.col('away_pts')).abs().alias('margin')))
else:
out = 'schedule unavailable'
out
π¬ Play-by-play β 2024 Finals Game 5β
espn_wnba_pbp returns a dict of component pieces (plays, boxscore, header, winprobability, β¦). The plays entry is a list of raw ESPN dicts; build a frame with pl.DataFrame(..., infer_schema_length=None). Its columns use raw dot-notation (period.number, clock.displayValue, scoringPlay, type.text).
pbp = safe('Game 5 pbp', lambda: wnba.espn_wnba_pbp(game_id=401726992))
print('dict keys:', list(pbp.keys())[:8] if pbp is not None else None)
if pbp is not None and pbp.get('plays'):
plays = pl.DataFrame(pbp['plays'], infer_schema_length=None)
out = plays.select(['period.number', 'clock.displayValue',
'type.text', 'text', 'scoringPlay']).head(10)
else:
plays = None
out = 'pbp unavailable'
out
Filter to scoring plays only to watch the lead change down the stretch.
(plays
.filter(pl.col('scoringPlay'))
.select(['period.number', 'clock.displayValue', 'homeScore', 'awayScore', 'text'])
.tail(8)
if plays is not None else 'pbp unavailable')
π Player season stats β Caitlin Clarkβ
espn_wnba_player_stats returns a single wide row covering ESPN's general / offensive / defensive stat groups (averages and totals). The 2024 Rookie of the Year, Caitlin Clark, is athlete_id=4433403. Pass total=True for season totals instead of per-game averages.
cc = safe('Caitlin Clark stats',
lambda: wnba.espn_wnba_player_stats(athlete_id=4433403, season=SEASON))
(cc.select(['full_name', 'team_abbreviation', 'general_games_played',
'offensive_avg_points', 'offensive_avg_assists',
'general_avg_rebounds', 'offensive_three_point_field_goal_pct'])
if cc is not None else 'player stats unavailable')
π Team season statsβ
espn_wnba_team_stats returns a dict keyed by category β {'Averages', 'Totals', 'Misc'}. Each value is a long frame of stat_name / display_value rows, so index into the dict rather than calling .head() on the return directly.
aces_stats = safe('Aces team stats',
lambda: wnba.espn_wnba_team_stats(team_id=17, season=SEASON))
print('categories:', list(aces_stats.keys()) if aces_stats is not None else None)
(aces_stats['Averages'].select(['stat_name', 'abbreviation', 'display_value']).head(10)
if aces_stats is not None else 'team stats unavailable')
π³ Cookbook: common WNBA tasksβ
Now for the fun part. These twelve recipes are the everyday tasks you'll reach for constantly β each blends a premium ESPN call (or a parquet loader) with a few polars expressions. They're all correct, runnable Python. The ESPN-backed recipes wear the safe() seatbelt; the loader-backed ones are rock-solid and run bare. π§βπ³
Recipe 1 β Standings table πβ
espn_wnba_standings gives one row per team with wins, losses, win percentage and point differential. Sort by win percentage to get the playoff picture.
standings = safe('2024 standings', lambda: wnba.espn_wnba_standings(season=SEASON))
(standings
.select(['team_display_name', 'wins', 'losses', 'win_percent', 'point_differential'])
.sort('win_percent', descending=True)
.head(8)
if standings is not None else 'standings unavailable')
Recipe 2 β Draft board πβ
espn_wnba_draft lists every pick for a season. The 2024 draft headlined with Caitlin Clark going first overall to the Indiana Fever.
draft = safe('2024 draft', lambda: wnba.espn_wnba_draft(season=SEASON))
(draft.select(['overall_pick', 'team_display_name', 'athlete_display_name',
'athlete_position_abbreviation', 'school_name']).head(10)
if draft is not None else 'draft unavailable')
Recipe 3 β Top 10 scorers of the season πβ
load_wnba_player_boxscore reads a whole season's player box scores from a parquet release (no per-game API calls). Drop did-not-play rows, then aggregate points and assists per player with polars. Loaders are reliable, so this one runs bare.
box = wnba.load_wnba_player_boxscore(seasons=[SEASON])
top_scorers = (
box
.filter(~pl.col('did_not_play'))
.group_by(['athlete_display_name', 'team_abbreviation'])
.agg([
pl.len().alias('games'),
pl.col('points').sum().alias('total_points'),
pl.col('points').mean().round(1).alias('ppg'),
pl.col('assists').mean().round(1).alias('apg'),
])
.filter(pl.col('games') >= 20)
.sort('ppg', descending=True)
.head(10)
)
top_scorers
Recipe 4 β Who worked the whistle? πβ
espn_wnba_game_officials returns the referees assigned to a game β handy for officiating studies. Pair a game_id from the schedule with this call.
refs = safe('Game 5 officials',
lambda: wnba.espn_wnba_game_officials(game_id=401726992, season=SEASON))
if refs is not None and refs.height:
keep = [c for c in ['full_name', 'display_name', 'position', 'order'] if c in refs.columns]
out = refs.select(keep) if keep else refs.head()
else:
out = 'officials unavailable'
out
Recipe 5 β Best net rating in the league βοΈβ
load_wnba_team_boxscore carries each team's score and its opponent's score per game. Average points for minus points against gives a quick-and-dirty net rating β the single best one-number summary of who's good. We require 20+ games to drop the All-Star exhibition noise.
team_box = wnba.load_wnba_team_boxscore(seasons=[SEASON])
net_rating = (
team_box
.group_by(['team_abbreviation', 'team_display_name'])
.agg([
pl.len().alias('games'),
pl.col('team_score').mean().round(1).alias('pts_for'),
pl.col('opponent_team_score').mean().round(1).alias('pts_against'),
])
.filter(pl.col('games') >= 20)
.with_columns((pl.col('pts_for') - pl.col('pts_against')).round(1).alias('net'))
.sort('net', descending=True)
)
net_rating
Recipe 6 β Double-double machines π β
Count games where a player hit double digits in two of the five box-score categories (points, rebounds, assists, steals, blocks) β the classic double-double, plus triple-doubles for free. All from the player box-score loader and a little polars boolean arithmetic.
cats = ['points', 'rebounds', 'assists', 'steals', 'blocks']
double_doubles = (
box
.filter(~pl.col('did_not_play'))
.with_columns(
sum((pl.col(c) >= 10).cast(pl.Int8) for c in cats).alias('cats10')
)
.with_columns([
(pl.col('cats10') >= 2).alias('is_dd'),
(pl.col('cats10') >= 3).alias('is_td'),
])
.group_by(['athlete_display_name', 'team_abbreviation'])
.agg([
pl.col('is_dd').sum().alias('double_doubles'),
pl.col('is_td').sum().alias('triple_doubles'),
])
.sort(['double_doubles', 'triple_doubles'], descending=True)
.head(10)
)
double_doubles
Recipe 7 β Most efficient high-volume scorers π―β
Raw points reward volume; true shooting % rewards efficiency. TS% = points / (2 Γ (FGA + 0.44 Γ FTA)). Aggregate the makes/attempts from the box-score loader, keep players with real workloads, and you've got the league's most efficient buckets.
true_shooting = (
box
.filter(~pl.col('did_not_play'))
.group_by(['athlete_display_name', 'team_abbreviation'])
.agg([
pl.len().alias('games'),
pl.col('points').sum().alias('pts'),
pl.col('field_goals_attempted').sum().alias('fga'),
pl.col('free_throws_attempted').sum().alias('fta'),
])
.filter((pl.col('games') >= 20) & (pl.col('pts') >= 300))
.with_columns(
(pl.col('pts') / (2 * (pl.col('fga') + 0.44 * pl.col('fta'))) * 100)
.round(1).alias('ts_pct')
)
.sort('ts_pct', descending=True)
.head(10)
)
true_shooting