Data is quickly overtaking every industry. Business decisions on everything from how much inventory to buy to how many employees to hire are being influenced by newly-available data points and continually decreasing cost of computing power. These same drivers are creating an “everywhere analytics” culture that impacts us everywhere—and that includes professional basketball. So in honor of the tip-off of the 2016-2017 NBA season, let’s take a look at how data is revolutionizing basketball.
For access to the reports used to make this post, click here.
Waking Up to A Revolution
Less than 10 years ago, stat-heads were few and far between in NBA front offices. For the most part, there was very little incentive to pay attention to them. The NBA was seen as a superstar league, where the team with the best player usually won (Larry Bird’s Celtics, Magic Johnson’s Lakers, Michael Jordan’s Bulls, Shaquille O’Neal/Kobe Bryant’s Lakers, etc.). Nowadays, APBRmetrics, basketball’s version of baseball’s SABERmetrics, are a key tool for any well-run team. The industry, like so many others, is becoming more data driven, with data engineers and data scientists found on payroll for not only almost all 30 teams, but also in major media organizations. It’s now almost impossible to hear pundits talk about the game without mentioning player efficiency rating (PER), true shooting percentage, or plus/minus.
The NBA’s Data Revolution did not happen overnight, though. It is the culmination of years of trying to answer a simple question that has plagued owners, general managers (GMs) and coaches since the birth of the game: what is a player’s value?
According to Malcolm Gladwell, basketball is traditionally a strong-link game, which means having one great player is better than having two, or even three, average players. There are only five players on the floor at a time, so a superstar’s impact cannot be overstated. History has shown us that having a LeBron James, Michael Jordan or Kareem Abdul-Jabbar on your team virtually guarantees a minimum level of success. This is reflected down to the way the NBA markets itself. Most highlight reels or SportsCenter recaps give the Steph Currys, Carmelo Anthonys and Russell Westbrooks a disproportionate amount of facetime.
While the value of these superstars is usually pretty obvious, the teams that enjoy prolonged success are those who can correctly value everyone else—who can correctly build around a Tim Duncan or a Steve Nash.
Competition Leads to Innovation and Change
At its core, building a good basketball team is what an economist would call a resource allocation problem. How does one make the most effective use of the salary cap, team situation and other extraneous factors to put the best product on the floor? This is especially hard for small market teams in the NBA, because every team has the same salary cap, yet cities like Houston and Milwaukee don’t have nearly as much appeal as New York and Los Angeles.
This market “inefficiency” set the stage for innovation and meant that small market teams needed to maximize what they had. In 2004, the Seattle Supersonics were coming off of a few consecutive losing seasons. So they hired ABERmetrician Dean Oliver (now called the father of basketball analytics) as a consultant, marking the beginning of the big data movement in basketball. Two years later, another small market team, the Houston Rockets, turned the page on recent failures and reclaimed past success by hiring Daryl Morey to be their GM. By all accounts, Morey was a basketball nerd—someone who held a degree in statistics but had absolutely no experience as a player or scout.
New Perspectives on Old Wisdom
The work of Daryl Morey and others around the league caused a fundamental shift in the ways players were valued. When Morey took over the Rockets, most of the team’s salary was dedicated to two great but aging superstars, Yao Ming and Tracy McGrady. At the time, a player’s market value was largely a function of his impact on the box score (a statistical summary of the game), with some adjustment for other factors. Points, assists and rebounds all directly correlate towards salary, and for the most part, towards success. Morey had to find players who were undervalued by conventional box score methods but still had a lot to contribute. He wanted to build a successful team based on more than superstars.
No player embodied Morey's ideal more than Shane Battier, who was dubbed a No Stats All Star by the New York Times. Despite averaging around 10 career points per game, Battier quickly became the poster child for the analytics movement as a player who both embraced and was embraced by analytics. Morey pointed out how Battier did things that went completely unnoticed by most. Instead of grabbing for a rebound he might not get, Battier would tip the ball to a teammate. If he was guarding a smaller defender, Battier would leave his own man and block out the other team’s best rebounder. Nothing exemplified this more than his 2009 playoff series against Kobe Bryant. In Battier’s own words:
Take the average possession of the Lakers. They were going to score .98 points every time they had a possession. Yet Kobe Bryant only shot the left handed pull up jumper at a 44 percent clip. So every time that he went left and shot that pull up jumper he was generating .88 points per possession. Well that’s a tenth of a point less than the average Laker possession. And so if I could make him do that time and time again which is a lot tougher to do than to say, I’m shaving off a tenth of a point every single time. I’m actually making him detrimental to his team.
Not only was Battier contributing to the game, but he was minimizing the effect of the opposition's superstar.
While these new methodologies painted some players in a positive light, it showed glaring weaknesses in others. In 2012, Rudy Gay, a college superstar—and the Grizzlies supposed best player—was traded for Ed Davis and Jose Calderon, two relatively “average” players. At first, this was seen as a money-motivated moveand an unofficial white flag for the 2012 Grizzlies season. Gay was the leading scorer, averaged the most minutes and was revered as a superstar talent by his coach and peers.
But the truth was, although Gay was 27th in the league in usage rating (the percentage of a team’s possessions that a player uses), he had a player efficiency rating (PER, a metric that tries to standardize overall contribution with a league average of 15) of 15.6, barely above average. When Gay was off the court, the Grizzlies’ assist rate went up almost 6%. His OBPM (offensive box score plus/minus) said he was -1.2 points, meaning the team was scoring 1.2 points fewer than if he was replaced with a league average player (in contrast, LeBron finished the season with an astounding 9.2, which if anything, understates his value). This paints Gay as a ball hog who was taking away the chance for others to contribute. After the trade, the Grizzlies turned their season around and went on to make the conference finals for the first time in team history.
Expanding the Scope of Data
Metrics like PER and true shooting percentage (a shooting percentage that accounts for the difficulty of 3-point shots) can all be very powerful tools in evaluating player value, especially when compared to counting stats like points or blocks. In some cases, like the Rudy Gay trade in 2012, they hit the nail on the head. However, they’re often prone to outliers and limited sample sizes, which can be dangerous when used out of context. Furthermore, they essentially use the same information that’s been out there since the 1970s, just in different ways. For example, effective field goal percentage (eFG%), a metric to measure shooting percentage that accounts for the increased value of 3 point shots, is calculated as: (FG * .5 * Number of 3 pointers) / FGA.
In terms of painting a full picture of the game, though, a lot of “advanced” metrics require an even deeper exploration. In general, metrics treat each event, be it a shot, block, assist or turnover as a discrete event, to say nothing about the events leading up to them. Just like a checkmate on a chessboard is the result of the moves before it, each recorded event in a basketball game is the result of everything before it. In the words of Kirk Goldsberry, analytics expert for the San Antonio Spurs and former Grantland writer:
Basketball is a game of sequences. Unlike baseball or football, it is a relatively continuous free-flowing sport. The actions within a game are hard to separate because they are chronologically intertwined, and every event in every game is influenced in part by preceding sequences of actions … Most basketball statistics refer to discrete events such as shots, steals, and rebounds that occur within the continuous context of a flowing game … we kid ourselves and say a rebound or a corner 3 is akin to a strikeout or a home run, a singular accomplishment achieved by a player that’s fit for tallying and displaying in a cell on some spreadsheet on some website.
And Goldsberry would know. No team in recent years has demonstrated this better than the San Antonio Spurs. Just watch this clip.
How do you possibly “credit” that to anyone? How do you credit any of the rest of that video? Boris Diaw (33) made the shot, but he was wide open because Tim Duncan (21) drew the double team near the basket. However, Tim Duncan was only in that position because of Patty Mills’ (8) pass, which was courtesy of the no-look pass from Ginobili (20). Of course, Ginobili was only able to throw that pass because of the screen Duncan set in the beginning, which was defended by the defense in the way it was because of Danny Green’s (4) reputation as a great 3-point shooter in the right corner. In a box score, that reads as an assist from Duncan to Diaw, the same as it would if the two were playing 2 v. 5. But that’s far too simplistic. That play demonstrates everything that’s beautiful, and challenging, about basketball—and everything that's essential about basketball analytics.
DataBall: The Future of Analytics
Basketball’s first wave of the Data Revolution came from looking at the same numbers and measurements differently. The next wave is coming from being able to collect and analyze data that was never available before. Tracking cameras, based on technology originally used for tracking missiles, are now found in every NBA arena. They track things like the total distance each player covers, the number of dribbles before a shot and average defender distance. This raw data is being fed into machine learning algorithms trained to analyze the game through the eyes of a coach. We’re moving towards a world where GMs and coaches will be able to quantify how good a player’s screens were, how fast his average passes are and other such things that typically constitute the “eye test.”
Just like any other industry, knowing the data or fancy machine learning algorithms won’t make up for years of experience or subject matter knowledge. Properly using data in making decisions means using it as a tool, a suggestion to be fact-checked and scrutinized. Becoming data driven means using data to question assumptions, look for new ways to approach problems and measure results. It does not mean only evaluating a player off of their PER or qSI. Like almost all things data, it’s important to include a human component.
And of course, every season is full of surprises—that’s what makes the game fun. Let’s take a moment now to remember the craziness of the 2015-2016 NBA season: Steph Curry almost broke the game for a few weeks, Kobeand Tim Duncan retired in ways that almost perfectly mirrored their personalities, LeBron finally brought a championship to Cleveland, and the first fruits of generations of genetic testing in Latvia began to ripen. It’s nearly impossible to predict this year’s breakout star, rookie of the year or biggest letdown. (But it’s fun to try!)
In honor of speculating, let’s take a quick look at my 2015-2016 NBA MVP pick’s ridiculous efficiency from last year. Huge shoutout to Plotly for their incredible visualization library and great email tutorials:
*With the league taking more 3-pointers than ever before, the organization with the best analytics in the league, the Spurs, found that team defensive schemes were built towards defending 3s and shots in the paint, both of which are more "efficient" than mid-range 2-pointers (shooting 40% from 3 is roughly equivalent to shooting 60% from 2). The Spurs exploited this, realizing that opposing defensive schemes allowed for high percentage mid-range shots, which was a big part of Kawhi’s success.
The 2016-2017 NBA season will likely remind us of how drastically the sport has changed over the last 20 years. Today’s league is shooting more 3s and playing at a faster pace than ever before. As always, there’s a lot of buzz about Kevin Durant, LeBron and the league’s other superstars and storylines. But as we watch the season unfold, let’s not underestimate the way data is making basketball better than ever—nor overlook the player (or team) who may surprise us.
Ready to build your data workflows with Airflow?
Astronomer is the data engineering platform built by developers for developers. Send data anywhere with automated Apache Airflow workflows, built in minutes...