The Evolution of NBA Basketball
Introduction
Basketball is a sport that is continuously evolving. Ever since its invention in 1891, there have been many changes in both the rules and the style of play. Over the past decade, the league has undergone a rapid change due to both the increase in the use of analytics and changes in officiating.
NBA basketball in the 2010s will be remembered as the beginning of the “pace and space” era. Teams now play at a faster pace compared to the previous two decades and shoot far more three-point shots per game. This stylistic change is in part due to the success of the Golden State Warriors lead by Stephen Curry and the Houston Rockets, which forced other teams in the league to also emphasize spacing the floor in order to draw out defenders.
The rise of analytics, exemplified by Daryl Morey and the Houston Rockets has led to more layups and three-point attempts while also creating a sharp decline in long-range twos and mid-range shots. The mid-range jumpshot, the staple for superstars in the 1990s and 2000s such as Kobe Bryant and Michael Jordan, has been mostly abandoned. Combined with officiating favoring offense over defense, players are now scoring more points with better efficiency compared to 10 years ago.
This blog will use two methods to observe the shift in strategy over the course of the NBA’s existence. The first is to analyze shot location data, where we look for trends in the different locations where players shoot the ball since the 2010-2011 season. The second is by comparing present-day seasons to historical seasons. Finding the top historical comparisons to players today is interesting because one can determine if there are any players today that are similar to historical greats. In addition, one can examine how the league has changed by comparing “old-school” players such as Demar Derozan versus more modern players such as Stephen Curry.
Shot Data and Mapping
To obtain shot data, we use a R
package online which retrieves shot map data from the NBA stats website and API called BallR (ballr
).
Shot Data using ballr
package and the NBA API
The BallR package uses the NBA Stats API to help visualize shots taken by a player for a season. To run BallR, we have to run the following code in the console (taken from the BallR documentation).
packages = c("shiny", "tidyverse", "hexbin")
install.packages(packages, repos = "https://cran.rstudio.com/")
library(shiny)
runGitHub("ballr", "toddwschneider")
This will run the shiny app locally on your computer and will load the following functions and methods that we will need to use to generate a heat map of shot density:
court_maps.csv
is a dataframe that holds all the points of the different zones of a basketball court, like a mapplot_court.R
andplot_theme.R
are methods needed to draw the courtgenerate_heatmap_chart
is a function used to generate the heat map of shot datafetch_shots_by_player_id_and_season
is a function used to get shot data from the NBA API using a player’s ID and the season they were playing in.
The last function is a function we used to collect all the shot map data for all players in all season. To do this, we had to figure out all the player ID’s for all the players from the seasons between 2010 and 2021. After that, we iterated through every combination of player ID and season (if they played in that season) and used the fetch shots
function to obtain all the shot data for all players. Running this code, we noticed that there seemed to be a hard limit of around 600, so we had to hard reset the function each time we reached the limit. Some example code below shows our method to extract all the shot map data.
for(i in 1:nrow(player_stats)){
## for a dataframe called player_stats which has a player_id and season columns
stats <- fetch_shots_by_player_id_and_season(player_id= player_stats$person_id[i]
, season = test$season[i]
, player_stats = "Regular Season")
stats <- stats$player %>%
mutate(season = test$season[i])
## df is our final data set we are outputting
df <- rbind(df, stats)
## uncomment if you want to see the code running to make sure it isn't frozen
#print(i)
## Timer to not get kicked by API too fast
if(i %% 15 == 0){
Sys.sleep(5)
#print("Done Sleeping")
}
}
write.csv(df, "Data/total_shot_data.csv")
Our final data set was a .csv
file that was over half a gigabyte in memory and over 1.9 million observations, after running our code for a few hours.
Player Season Data
The data used was scrapped from Basketball Reference. For each player-season since 1982, the per-game and advanced statistics were collected. The original dataset was over 17,000 player-seasons, each with 43 statistical variables. In order to cut down the number of players analyzed, we decided to only examine “good” seasons, which were seasons where the player had a VORP (value over replacement player) of 1. To put into context, typically there are usually around 100 players per year with over 1 VORP. In order to account for the fact that many players play on multiple teams in one season, we only examine per-game and advanced stats rather than total stats and take the average for the player over the course of the season. Thus, we are able to obtain a relatively complete statistical profile for almost every “good” player for each year of their career.
Shot Type Graph and Table
In this graph we can see the trends in the percentage of shots in different zones for the league as a whole over the past decade. It is immediately apparent that there was a sharp decline in the number of mid-range shots over the course of the decade. The decline of the mid-range shot has been fueled by an increase in three point shots as well as shots in the paint (near but not right at the basket). It is also noteworthy that the percentage of shots in the restricted area has also declined over the past decade even though it is the most efficient shot possible. This is likely due to defenses focusing on limiting these efficient shots at the expense of giving up other shots.
Season | In The Paint (Non-RA) | Mid-Range | Restricted Area | Three-pointers | Total_Shots |
---|---|---|---|---|---|
2010-11 | 27501 | 55532 | 57131 | 38614 | 178778 |
2011-12 | 21924 | 44199 | 47847 | 32744 | 146714 |
2012-13 | 26357 | 51477 | 59884 | 44001 | 181719 |
2013-14 | 28081 | 50758 | 61032 | 48829 | 188700 |
2014-15 | 27474 | 49125 | 60092 | 49804 | 186495 |
2015-16 | 27728 | 46497 | 62582 | 53458 | 190265 |
2016-17 | 27263 | 42295 | 61379 | 59626 | 190563 |
2017-18 | 29753 | 36098 | 60531 | 63451 | 189833 |
2018-19 | 31933 | 30331 | 66512 | 71861 | 200637 |
2019-20 | 27499 | 22795 | 55777 | 65162 | 171233 |
2020-21 | 27836 | 20151 | 47289 | 61198 | 156474 |
In the table above, it is also apparent that the total number of shots has increased over the past ten years (2020 and 2021 seasons are cut short but per game shots are up). This is largely due to teams playing at a faster pace than over the course of the past decade. While this graph captures trends at a high level, we would also like to view exactly where players today are shooting more shots compared to ten years ago.
Heat Map
Using the BallR package, we can generate heatmaps of the number of shots taken in different locations around the league. The heat map provides several advantages over the graph. First, not all shots taken in the same “zone” are of the same quality. For example, three point shots in the corners are among the most efficient shots in basketball and are significantly more valuable than three point shots “above the break” where the line is curved. Second, heatmaps provide a visual representation of trends over time in specific areas of the court, such as the elbows or the baseline.
In the documentation, here is how the generate_heatmap_chart
function works.
generate_heatmap_chart = function(shots, base_court, court_theme = court_themes$dark) {
base_court +
stat_density_2d(
data = shots,
aes(x = loc_x, y = loc_y, fill = stat(density / max(density))),
geom = "raster", contour = FALSE, interpolate = TRUE, n = 200
) +
geom_path(
data = court_points,
aes(x = x, y = y, group = desc),
color = court_theme$lines
) +
scale_fill_viridis_c(
"Shot Frequency ",
limits = c(0, 1),
breaks = c(0, 1),
labels = c("low", "high "),
option = "inferno",
guide = guide_colorbar(barwidth = 10)
) +
theme(legend.text = element_text(size = rel(0.6)))
}
Using the other objects generated from the ballr
package mentioned before, we can write a for loop to generate the heat maps for all of the desired seasons, and plot them below.
for(i in 0:10){
curr_season = paste0(2010 + i,"-",11+i)
output %>%
filter(season == curr_season, shot_zone_basic %in% shot_zone_basic_list) %>%
generate_heatmap_chart(
base_court = plot_court(court_themes$dark),
court_theme = court_themes$dark
) + labs(
title = "Heat Map of All Shots",
subtitle = paste(curr_season, "Season")
)
ggsave(paste0(curr_season, ".png"))
}