Utah Jazz: Shot Analysis
In Spring 2018, I had a class where we needed to do a group project. My teammates and I chose to do some analysis of the Utah Jazz.
Here is the results in a 3 minute video:
The work actually won best project in the class of about 60 students. You can see our names in the hall of fame in the class at http://datasciencecourse.net/2020/fame/.
Below is our winning submission report of the project, including most the code and details.
During this project we:
- Webscraped ESPN shotcharts using BeautifulSoup
- Used Regex to find the types of shots
- Utilized Pandas to wrangle the data
- Ran k-means clustering on the court
- Calculated average shot worth for each player in each cluster
- Evaluated how these differences compared using ANOVA
- Predicted the outcome of a game based on the averages
Take Your Shot
Final Submission
Jacob Brown, Avery Smith, and Kyle Salisbury
Video Presentation Link: https://www.youtube.com/watch?v=HDrmcKn1qhI
Members | uid | |
---|---|---|
Jacob Brown | u0729080@utah.edu | u0729080 |
Avery Smith | averyjs@gmail.com | u0838931 |
Kyle Salisbury | Kcsals@gmail.com | u0711328 |
Primary Questions:
What are the natural groupings/clusters of shots on a basketball court?
Which combinations of player and shooting location have the highest expected value (shooting pct * points)?
Are the differences in shooting percentage statistically significant?
How does shooting pct vary at Home vs. Away?
Given only the location and shooters for a game not in our dataset, can we predict the final score of the Jazz, the amount of points each player scored, and whether or not they won?
Accomplished:
-
Web scraped all data from sources and created “final” csv
-
Obtained key data points using Regex
-
Cleaned data and created various dataframes
-
Unsupervised clustering (k-means) to divide court into 6 clusters (futher divided by 2 pointer and 3 pointer)
-
Calculated expected value for each player in each court position and reported them on shot charts
-
Calculated significance for shooting percentages by player and location
-
Explored expected value difference for Home VS Away games
-
Predicted the score of Jazz game, along with individual player totals.
Methods Used:
-
Web scraping
-
Regex
-
Dataframes (including masking)
-
Unsupervised clustering (k-means)
-
Loops and logic
-
Hypothesis Testing
-
Visualizations (Scatter plots, heat maps)
-
Predictions via pseduo-model
Programming and Methods:
# Import All Library Packages
from bs4 import BeautifulSoup
import requests
import urllib.request
import re
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn import metrics
from sklearn.metrics import silhouette_samples, silhouette_score
import math
import scipy as sc
from scipy.stats import norm
# Develop some color maps
seven_colors = ListedColormap(["#e41a1c","#984ea3","#a65628","#377eb8","#ffff33","#4daf4a","#ff7f00"])
cmap_bold = ListedColormap(['#FF0000', '#00FF00'])
# Load in the court picture
img = plt.imread('JazzCortHalf.png')
Data Aquisition Process
(This may take quite a while to run. It also saves local htmls so it is suggested, if running the code, to start later at Exploratory Analysis section)
We scraped shot charts for the Utah Jazz from http://www.espn.com . We used the hyperlinks on the Jazz schedule page to find all the Jazz games for the entire season. We saved each page as a .html file so we could interact with them without having to scrape them over and over again.
# Function to get soups for a given URL
def getWebsiteAsSoup(url):
"""
Retrieve a website and return it as a BeautifulSoup object.
"""
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
classlist_html = response.read()
class_soup = BeautifulSoup(classlist_html, 'html.parser')
with open('class_list.html', 'w') as new_file:
new_file.write(str(class_soup))
return class_soup
# url for the jazz schedule
schedule_url = "http://www.espn.com/nba/team/schedule/_/name/utah/utah-jazz"
schedule_soup = getWebsiteAsSoup(schedule_url)
base_url = "http://www.espn.com/nba/game?gameId=" # append url_endings to this
url_endings = []
regex = '//www.espn.com/nba/recap/_/id/(\d+)"'
for a_element in schedule_soup.find_all('a'): # find all elements of type 'a'
ending = re.findall(regex, str(a_element))
if ending != []: # many of these elements won't contain the regular expression--skip them
url_endings.append(ending[0])
print('Number of Jazz Games:')
print(len(url_endings)) # shows how many games the Jazz have played so far
# Function to save html for a given URL
def saveWebsiteToLocal(url, number):
"""
Retrieve a website and save it locally as an html.
"""
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
classlist_html = response.read()
# print(classlist_html)
class_soup = BeautifulSoup(classlist_html, 'html.parser')
with open('html/game_' + str(number) + '.html', 'w') as new_file:
new_file.write(str(class_soup))
return
# download all the games to a local copy
i = 1
for game in url_endings:
saveWebsiteToLocal(base_url+url_endings[i-1], i)
i+=1
time.sleep(10)
Data Processing
The following image was screenshotted from the url http://www.espn.com/nba/game?gameId=400975701 . Each of the dots is an element of type ‘li’ which can be scraped.
We used beautiful soup to identify the html element for each shot, and used regular expressions to extract the interesting data from each shot. This is what the HTML looks like. We were mostly interested in data-text, data-homeaway, data-shooter, and left and top positions.
# regular expressions to obtain key data
utah_regex = 'utah.png'
made_missed_regex = r'class="(\w+)"'
period_regex = r'data-period="(\d)"'
shooter_regex = r'data-shooter="(\d+)"'
blocks_regex = r'blocks'
blocks_shooter_name_regex = r"blocks (\w+ \w+)"
shooter_name_regex = r'data-text="(\w+ \w+)'
distance_regex = r' (\d+-foot)'
type_regex = r'foot ([\w ]+)[ "]'
alt_type_regex = r'e*s ([\w ]+)["\(]'
assist_regex = r'\((\w+ \w+) assists\)'
left_regex = r'left:(\d+.\d+)%'
top_regex = r'top:(\d+.\d+)%'
three_regex = r'three'
# Obtaining key words from scraping
start = time.clock()
array = []
tot_games = 80
for i in range(1, tot_games+1):
GameWebsite = BeautifulSoup(open("html/game_" + str(i) + ".html"), "html.parser")
court_symbol = GameWebsite.select('.shot-chart > .team-logo')
home_team = re.findall(utah_regex, str(court_symbol))
if home_team:
AllJazzShots = GameWebsite.find_all(class_="shots home-team")[0]
homeaway = 1
else:
AllJazzShots = GameWebsite.find_all(class_="shots away-team")[0]
homeaway = 0
for j in range(0, 300):
Shot = AllJazzShots.find(id="shot" + str(j))
if Shot == None:
continue
game = i
shot = j
made_missed = re.findall(made_missed_regex, str(Shot))[0]
if made_missed == "made":
made_missed = 1
else:
made_missed = 0
period = re.findall(period_regex, str(Shot))[0]
shooter = re.findall(shooter_regex, str(Shot))[0]
block = re.findall(blocks_regex, str(Shot))
shooter_name = re.findall(shooter_name_regex, str(Shot))
if block:
shooter_name = re.findall(blocks_shooter_name_regex, str(Shot))[0]
elif shooter_name == []:
shooter_name = None
else:
shooter_name = shooter_name[0]
distance = re.findall(distance_regex, str(Shot))
if distance == []:
distance = None
else:
distance = distance[0]
shot_type = re.findall(type_regex, str(Shot))
if shot_type == []:
shot_type = re.findall(alt_type_regex, str(Shot))
#if shot_type == []:
# shot_type = "deviant"
shot_type = shot_type[0]
# clears out some problems associated with greedy regex
start = shot_type.find("makes ") + len("makes ")
if start >= len("makes "):
shot_type = shot_type[start:]
start = shot_type.find("misses ") + len("misses ")
if start >= len("misses "):
shot_type = shot_type[start:]
assist = re.findall(assist_regex, str(Shot))
if assist == []:
assist = None
else:
assist = assist[0]
left = float(re.findall(left_regex, str(Shot))[0])
# one axis needs to be flipped depending on if it is home or away
if (homeaway == 0):
left = 100-left
#print('away')
top = float(re.findall(top_regex, str(Shot))[0])
if homeaway:
top = 100-top
three = re.findall(three_regex, shot_type)
if three == []:
three = 0
else:
three = 1
game_array = [game, shot, homeaway, made_missed, period, shooter, shooter_name, distance, shot_type,
assist, left, top, three]
array.append(game_array)
end = time.clock()
print("This took " + str(end-start) + " seconds to run")
This took 44.585532 seconds to run
columns = ["game", "shot", "home/away", "made/missed", "period", "shooter", "shooter_name",
"distance", "shot_type", "assist", "left", "top", "ThreePt"]
print('Total Number of Shots: ' + str(len(array))) # total number of shots
print('Average Shots Per Game: ' + str(len(array)/tot_games)) # avg shots per game
Total Number of Shots: 6614
Average Shots Per Game: 82.675
panda_dataframe = pd.DataFrame(array, columns=columns)
panda_dataframe.head()
game | shot | home/away | made/missed | period | shooter | shooter_name | distance | shot_type | assist | left | top | ThreePt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 1 | 1 | 4257 | Derrick Favors | 6-foot | jumper | Joe Ingles | 91.333333 | 60.0 | 0 |
1 | 1 | 1 | 1 | 0 | 1 | 4257 | Derrick Favors | 12-foot | jumper | None | 86.888889 | 70.0 | 0 |
2 | 1 | 2 | 1 | 1 | 1 | 3032976 | Rudy Gobert | 3-foot | dunk | Joe Ingles | 90.222222 | 48.0 | 0 |
3 | 1 | 5 | 1 | 0 | 1 | 3908809 | Donovan Mitchell | 8-foot | pullup jump shot | None | 84.666667 | 58.0 | 0 |
4 | 1 | 6 | 1 | 0 | 1 | 4011 | Ricky Rubio | 18-foot | pullup jump shot | None | 74.666667 | 62.0 | 0 |
panda_dataframe.to_csv("shots_dataframe_final.csv")
Exploratory Analysis
Data can be read from here without having to run the top half of the notebook - (which could take a while)
# data can be read from here without having to run the top half of the notebook
# (which could take a while)
ShotsPD = pd.read_csv("shots_dataframe.csv")
# Describe Data Set
ShotsPD.describe()
Unnamed: 0 | game | shot | home/away | made/missed | period | shooter | left | top | ThreePt | |
---|---|---|---|---|---|---|---|---|---|---|
count | 6793.000000 | 6793.000000 | 6793.000000 | 6793.000000 | 6793.000000 | 6793.000000 | 6.793000e+03 | 6793.000000 | 6793.000000 | 6793.000000 |
mean | 3396.000000 | 41.636096 | 49.889445 | 0.496688 | 0.462093 | 2.474901 | 1.736174e+06 | 83.788835 | 50.669071 | 0.342264 |
std | 1961.114522 | 23.667811 | 30.681017 | 0.500026 | 0.498598 | 1.129431 | 1.664691e+06 | 10.165391 | 21.980889 | 0.474502 |
min | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.007000e+03 | 48.000000 | 2.000000 | 0.000000 |
25% | 1698.000000 | 21.000000 | 23.000000 | 0.000000 | 0.000000 | 1.000000 | 4.257000e+03 | 74.666667 | 44.000000 | 0.000000 |
50% | 3396.000000 | 42.000000 | 49.000000 | 0.000000 | 0.000000 | 2.000000 | 2.581177e+06 | 87.777778 | 50.000000 | 0.000000 |
75% | 5094.000000 | 62.000000 | 75.000000 | 1.000000 | 1.000000 | 3.000000 | 3.032976e+06 | 92.444444 | 60.000000 | 1.000000 |
max | 6792.000000 | 82.000000 | 133.000000 | 1.000000 | 1.000000 | 5.000000 | 4.065673e+06 | 98.888889 | 98.000000 | 1.000000 |
print('Number of Shots Taken by Each Player:')
print('----------------------------------------')
print(ShotsPD['shooter_name'].value_counts(), '\n')
Number of Shots Taken by Each Player:
----------------------------------------
Donovan Mitchell 1361
Ricky Rubio 827
Joe Ingles 718
Derrick Favors 702
Rodney Hood 552
Rudy Gobert 442
Alec Burks 413
Jonas Jerebko 341
Jae Crowder 295
Royce O 285
Thabo Sefolosha 240
Joe Johnson 226
Raul Neto 151
Ekpe Udoh 119
Dante Exum 87
Tony Bradley 11
Georges Niang 11
Nate Wolters 6
David Stockton 3
Erik McCree 2
Naz Mitrou 1
Name: shooter_name, dtype: int64
ShotsPD = ShotsPD.replace("Royce O", "Royce O'Neale")
# filter out any shooter with less than 100 shots for the season
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Dante Exum"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Tony Bradley"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Nate Wolters"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="David Stockton"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Georges Niang"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Erik McCree"]
ShotsPD = ShotsPD[ShotsPD["shooter_name"]!="Naz Mitrou"]
print('Types of Shots Taken:')
print('----------------------------------------')
print(ShotsPD['shot_type'].value_counts(), '\n')
Types of Shots Taken:
----------------------------------------
three point jumper 2004
two point shot 901
driving layup 590
jumper 533
pullup jump shot 490
layup 371
dunk 235
step back jumpshot 190
layup 185
driving floating jump shot 141
three point pullup jump shot 135
dunk 131
tip shot 118
driving layup 103
two point shot 90
three pointer 84
hook shot 70
three point jumper 55
jump bank shot 36
driving dunk 34
alley oop dunk shot 28
alley oop layup 26
alley oop dunk shot 26
jumper 21
three point shot 20
alley oop layup 10
finger roll layup 10
driving dunk 9
running pullup jump shot 9
finger roll layup 4
pullup jump shot 4
shot 3
hook shot 3
jump bank shot 1
step back jumpshot 1
driving floating jump shot 1
Name: shot_type, dtype: int64
Unsupervised clustering via kmeans to find natural clusters of the shots
We ultimately wanted to group the shots into different clusters for further analysis as groups. We wanted to try an unsupervised clustering algorithm to give us some insight into how the computer might see the court. We used kmeans because it was easy to implement, and because we were interested only in x and y location as our variables, so kmeans seemed like it would naturally lend itself to our analysis.
We used kmeans to cluster the basketball shots based on their X and Y locations on the court. We chose to use 6 different clusters, because that lead to results that were the most easily idenfifiable by humans. We were quite happy with our results. One group was right by the rim in the area called the key/paint/post. There were 5 other regions spanning the court that included both 2 pointers and 3 pointers. These zones correlated quite nicely with what we would naturally identify as the left and right corners, wings, and the middle of the court. Further dividing the groups into two-pointers and three-pointers gives us 11 separate clusters for further analysis.
# Show the Natural Clusterings on the court with colors
X = np.zeros( (len(ShotsPD), 2) )
X[:, 0] = ShotsPD['left']
X[:, 1] = ShotsPD['top']
y_pred = KMeans(n_clusters=6, n_init=10, init='random', max_iter=300).fit_predict(X)
# Saves these locations to the dataframe
ShotsPD['LocationCluster'] = y_pred
ShotsPD.to_csv("location_dataframe_final.csv")
# Redistribute the left data to be on scale of 0-100 (to plot on court pic)
xNorm = 100*(ShotsPD['left'] - min(ShotsPD['left'])) / (max(ShotsPD['left']) - min(ShotsPD['left']))
plt.scatter(xNorm[:], X[:, 1], c=ShotsPD['LocationCluster'], marker="o", cmap=seven_colors);
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
These clusters are interesting. They naturally appear to match what we would identify as the key, the left and right corners, the wings, and the middle of the court. We added these groupings to our dataset
We created shot charts for the team as a whole, and for each individual player
# unsupervised clustering can give different labels
# so read in already clustered data for consistency
ShotsPD = pd.read_csv("location_dataframe.csv")
# Redistribute the left data to be on scale of 0-100
xNorm = 100*(ShotsPD['left'] - min(ShotsPD['left'])) / (max(ShotsPD['left']) - min(ShotsPD['left']))
ShotsPD['left'] = xNorm
# Entire shots by Jazz by location, makes and misses
# greens are makes, reds are misses
cmap_bold = ListedColormap(['#FF0000', '#00FF00'])
plt.scatter(ShotsPD['left'],ShotsPD['top'],c=ShotsPD['made/missed'],cmap=cmap_bold)
plt.colorbar()
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.title('2017-18 Season Shot Chart for Entire Jazz Team')
plt.grid(False)
plt.show()
Individual players
# shot chart for every player on the Jazz
# greens are makes, reds are misses
for shooter_name in ShotsPD['shooter_name'].unique():
shooter_shots = ShotsPD[ShotsPD['shooter_name'] == shooter_name]
plt.scatter(shooter_shots['left'], shooter_shots['top'], c=shooter_shots['made/missed'], cmap=cmap_bold)
plt.title(str(shooter_name))
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
We can use simple data frame masking and math to compute some simple statistics for individual players
mitchell_shots = ShotsPD[ShotsPD["shooter_name"]=="Donovan Mitchell"]
print('Number of Shots Mitchell has shot: ' + str(len(mitchell_shots)))
mitchell_makes = mitchell_shots[mitchell_shots["made/missed"]==1]
mitchell_misses = mitchell_shots[mitchell_shots["made/missed"]==0]
print('Number of Shots Mitchell has made: ' + str(len(mitchell_makes)))
print('Number of Shots Mitchell has missed: ' + str(len(mitchell_misses)))
print('Mitchell Field Goal Percentage: ' + str(len(mitchell_makes)/len(mitchell_shots))) # Mitchell field goal shooting pct
Number of Shots Mitchell has shot: 1361
Number of Shots Mitchell has made: 595
Number of Shots Mitchell has missed: 766
Mitchell Field Goal Percentage: 0.4371785451873622
mitchell_threes = mitchell_shots[mitchell_shots["ThreePt"]==1]
mitchell_twos = mitchell_shots[mitchell_shots["ThreePt"]==0]
two_pt_pct = len(mitchell_twos[mitchell_twos["made/missed"]==1])/len(mitchell_twos)
three_pt_pct = len(mitchell_threes[mitchell_threes["made/missed"]==1])/len(mitchell_threes)
print("Mitchell's two point percentage is: " + str(round(two_pt_pct*100, 2)) + " %")
print("Mitchell's three point percentage is: " + str(round(three_pt_pct*100, 2)) + " %")
Mitchell's two point percentage is: 49.71 %
Mitchell's three point percentage is: 33.79 %
We try to get an idea of which regions have the highest expected value (for the team as a whole). We’ll plot them later as well. This shows the expected values for 3 pointers and then two pointers
## Team Stats -- 3 Pointers
# 3 pointers
PercMadeDif3 = []
NumShots = []
NumMade = []
ExpectedValue3 =[]
AvLeft3 = []
AvTop3 =[]
PtVal = 3
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==1)]
NumShots.append(len(Location['made/missed']))
NumMade.append(len(Location[Location['made/missed']==1]))
if NumShots[i] > 1:
PercMade = NumMade[i] / NumShots[i]
PercMadeDif3.append(PercMade)
else:
PercMadeDif3.append(0)
ExpectedValue3.append(PercMadeDif3[i]*PtVal)
AvLeft3.append(np.mean(Location['left']))
AvTop3.append(np.mean(Location['top']))
print('---- 3 Pointers ----')
print('Percentages')
print(PercMadeDif3)
print('---------------------')
print('Expected Values')
print(ExpectedValue3)
---- 3 Pointers ----
Percentages
[0, 0.3986636971046771, 0.4009111617312073, 0.35555555555555557, 0.31875, 0.3536231884057971]
---------------------
Expected Values
[0, 1.1959910913140313, 1.2027334851936218, 1.0666666666666667, 0.9562499999999999, 1.0608695652173914]
## Team Stats -- 2 Pointers
# 2 pointers
PercMadeDif2 = []
NumShots = []
NumMade = []
ExpectedValue2 =[]
PtVal = 2
AvLeft2 = []
AvTop2 =[]
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==0)]
NumShots.append(len(Location['made/missed']))
NumMade.append(len(Location[Location['made/missed']==1]))
PercMade = NumMade[i] / NumShots[i]
PercMadeDif2.append(PercMade)
ExpectedValue2.append(PercMadeDif2[i]*PtVal)
AvLeft2.append(np.mean(Location['left']))
AvTop2.append(np.mean(Location['top']))
print('---- 2 Pointers ----')
print('Percentages')
print(PercMadeDif2)
print('---------------------')
print('Expected Values')
print(ExpectedValue2)
---- 2 Pointers ----
Percentages
[0.5643629217163446, 0.43023255813953487, 0.3561643835616438, 0.4018264840182648, 0.3465909090909091, 0.44086021505376344]
---------------------
Expected Values
[1.1287258434326892, 0.8604651162790697, 0.7123287671232876, 0.8036529680365296, 0.6931818181818182, 0.8817204301075269]
## Delete some parts mostly because the paint (key) cluster won't have a 3, only a 2.
xx = np.isnan(AvLeft3)
for i in range(0, len(AvLeft3)):
if xx[i] == True:
DeleteVar = i
DeleteVar
del AvLeft3[DeleteVar]
del AvTop3[DeleteVar]
# Show where each cluster is located on the court (the means!)
import seaborn as sns
df = pd.DataFrame({
'x': AvLeft3 + AvLeft2,
'y': AvTop3 + AvTop2,
'group': ['0','1', '2','3','4','5','6','7','8','9','10']
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left', size='medium',
color='black', weight='semibold')
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
## Delete the 3's for the parents of cluster 5
for i in range(0,len(ExpectedValue3)):
if ExpectedValue3[i] == 0:
Extra = i
del ExpectedValue3[Extra]
# Create expected values and round it for simplicity
ExpectedValue = ExpectedValue3 + ExpectedValue2
ExpectedValueRound = np.round_(ExpectedValue, decimals=2)
Analysis
Plot the expected values for the team
## Team chart with expected values
ExpectedValue = ExpectedValue3 + ExpectedValue2
df = pd.DataFrame({
'x': AvLeft3 + AvLeft2,
'y': AvTop3 + AvTop2,
'group': [str(ExpectedValueRound[0]),str(ExpectedValueRound[1]),str(ExpectedValueRound[2]),
str(ExpectedValueRound[3]),str(ExpectedValueRound[4]),str(ExpectedValueRound[5]),
str(ExpectedValueRound[6]),str(ExpectedValueRound[7]),str(ExpectedValueRound[8]),
str(ExpectedValueRound[9]),str(ExpectedValueRound[10])]
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left',
size='medium', color='black', weight='semibold')
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
sns.plt.show()
Heat map for the team with color scale
x = AvLeft3 + AvLeft2
y = AvTop3 + AvTop2
B = ExpectedValueRound
low = np.min(B)
high = np.max(B)
cs = plt.scatter(x,y,c=B,cmap=plt.cm.bwr,vmin=low,vmax=high)
plt.colorbar(cs)
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
# red is hot--high expected value. blue is cool--low expected value
As we can see, corner threes and twos in they key are the most efficient shots for the Jazz team overall. The longer jumper two’s are the worst shot the Jazz can shoot as a team. Hence, from this plot we can take it that shooting a two pointer isn’t really worth it, unless it is inside the paint.
Using masking and loops with added logic, we are able to look at all the expected values on the court for each player. We will also print out the values and player name
# 3 pointers
PlayerIDs = np.unique(ShotsPD['shooter'])
PlayerNames = np.unique(ShotsPD['shooter_name'])
NumOPlayers = len(PlayerNames)
for Name in range(0,NumOPlayers):
PercMadeDif3 = []
NumShots = []
NumMade = []
ExpectedValue3 =[]
AvLeft3 = []
AvTop3 =[]
PtVal3 = 3
PtVal2 = 2
PercMadeDif2 = []
NumShots3 = []
NumShots2 = []
NumMade3 = []
NumMade2 = []
ExpectedValue2 =[]
ExpectedValueRound = []
AvLeft2 = []
AvTop2 =[]
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==1)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])]
NumShots3.append(len(Location['made/missed']))
NumMade3.append(len(Location[Location['made/missed']==1]))
PercMade3 = 0
if NumShots3[i] > 1:
PercMade3 = NumMade3[i] / NumShots3[i]
PercMadeDif3.append(PercMade3)
else:
PercMadeDif3.append(0)
ExpectedValue3.append(PercMadeDif3[i]*PtVal3)
AvLeft3.append(np.mean(Location['left']))
AvTop3.append(np.mean(Location['top']))
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==0)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])]
NumShots2.append(len(Location['made/missed']))
NumMade2.append(len(Location[Location['made/missed']==1]))
PercMade2 = 0
if NumShots2[i] > 1:
PercMade2 = NumMade2[i] / NumShots2[i]
PercMadeDif2.append(PercMade2)
else:
PercMadeDif2.append(0)
ExpectedValue2.append(PercMadeDif2[i]*PtVal2)
AvLeft2.append(np.mean(Location['left']))
AvTop2.append(np.mean(Location['top']))
print('---------------------------------------------------------------')
print(PlayerNames[Name])
print('---------------------')
print('---- 3 Pointers ----')
print('Percentages')
print(PercMadeDif3)
print('---------------------')
print('Expected Values')
print(ExpectedValue3)
print('---- 2 Pointers ----')
print('Percentages')
print(PercMadeDif2)
print('---------------------')
print('Expected Values')
print(ExpectedValue2)
for mm in range(0,len(ExpectedValue3)):
if ExpectedValue3[mm] == 0:
Extra = mm
del ExpectedValue3[Extra]
xx = np.isnan(AvLeft3)
for mm in range(0, len(AvLeft3)):
if xx[mm] == True:
DeleteVar = mm
del AvLeft3[DeleteVar]
del AvTop3[DeleteVar]
ExpectedValue = ExpectedValue3 + ExpectedValue2
ExpectedValueRound = np.round_(ExpectedValue, decimals=2)
AvLefts = AvLeft3 + AvLeft2
AvTops = AvTop3 + AvTop2
for u in range(0,len(AvLefts)):
if math.isnan(AvLefts[u]):
AvLefts[u]=90
if math.isnan(AvTops[u]):
AvTops[u]= 0
x = np.round(AvLefts,decimals=0)
y = np.round(AvTops,decimals=0)
valz = [str(ExpectedValueRound[0]),str(ExpectedValueRound[1]),str(ExpectedValueRound[2]),
str(ExpectedValueRound[3]),str(ExpectedValueRound[4]),str(ExpectedValueRound[5]),
str(ExpectedValueRound[6]),str(ExpectedValueRound[7]),str(ExpectedValueRound[8]),
str(ExpectedValueRound[9]),str(ExpectedValueRound[10])]
df = pd.DataFrame({
'x': x,
'y': y,
'group': valz
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left',
size='medium', color='black', weight='semibold')
plt.title(PlayerNames[Name])
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
---------------------------------------------------------------
Alec Burks
---------------------
---- 3 Pointers ----
Percentages
[0, 0.3888888888888889, 0.4, 0.37142857142857144, 0.2558139534883721, 0.30434782608695654]
---------------------
Expected Values
[0, 1.1666666666666667, 1.2000000000000002, 1.1142857142857143, 0.7674418604651163, 0.9130434782608696]
---- 2 Pointers ----
Percentages
[0.4816753926701571, 0, 0.0, 0.47619047619047616, 0.375, 0.36]
---------------------
Expected Values
[0.9633507853403142, 0, 0.0, 0.9523809523809523, 0.75, 0.72]
---------------------------------------------------------------
Derrick Favors
---------------------
---- 3 Pointers ----
Percentages
[0, 0.26666666666666666, 0.19230769230769232, 0.0, 0, 0.3333333333333333]
---------------------
Expected Values
[0, 0.8, 0.576923076923077, 0.0, 0, 1.0]
---- 2 Pointers ----
Percentages
[0.6680584551148225, 0.3333333333333333, 0.3333333333333333, 0.2413793103448276, 0.3793103448275862, 0.44871794871794873]
---------------------
Expected Values
[1.336116910229645, 0.6666666666666666, 0.6666666666666666, 0.4827586206896552, 0.7586206896551724, 0.8974358974358975]
---------------------------------------------------------------
Donovan Mitchell
---------------------
---- 3 Pointers ----
Percentages
[0, 0.36764705882352944, 0.5555555555555556, 0.271523178807947, 0.3287671232876712, 0.3333333333333333]
---------------------
Expected Values
[0, 1.1029411764705883, 1.6666666666666667, 0.814569536423841, 0.9863013698630136, 1.0]
---- 2 Pointers ----
Percentages
[0.5358931552587646, 0.7, 0.42857142857142855, 0.45454545454545453, 0.32558139534883723, 0.4]
---------------------
Expected Values
[1.0717863105175292, 1.4, 0.8571428571428571, 0.9090909090909091, 0.6511627906976745, 0.8]
---------------------------------------------------------------
Ekpe Udoh
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.5357142857142857, 0, 0.0, 0, 0.0, 0]
---------------------
Expected Values
[1.0714285714285714, 0, 0.0, 0, 0.0, 0]
---------------------------------------------------------------
Jae Crowder
---------------------
---- 3 Pointers ----
Percentages
[0, 0.5833333333333334, 0.22580645161290322, 0.2702702702702703, 0.2571428571428571, 0.3333333333333333]
---------------------
Expected Values
[0, 1.75, 0.6774193548387096, 0.8108108108108109, 0.7714285714285714, 1.0]
---- 2 Pointers ----
Percentages
[0.5617977528089888, 0.5, 0.2222222222222222, 0.3125, 0.2, 0.35294117647058826]
---------------------
Expected Values
[1.1235955056179776, 1.0, 0.4444444444444444, 0.625, 0.4, 0.7058823529411765]
---------------------------------------------------------------
Joe Ingles
---------------------
---- 3 Pointers ----
Percentages
[0, 0.45871559633027525, 0.5, 0.43434343434343436, 0.3655913978494624, 0.4642857142857143]
---------------------
Expected Values
[0, 1.3761467889908259, 1.5, 1.3030303030303032, 1.096774193548387, 1.3928571428571428]
---- 2 Pointers ----
Percentages
[0.5621890547263682, 0.5, 0.25, 0.3333333333333333, 0.3, 0.45454545454545453]
---------------------
Expected Values
[1.1243781094527363, 1.0, 0.5, 0.6666666666666666, 0.6, 0.9090909090909091]
---------------------------------------------------------------
Joe Johnson
---------------------
---- 3 Pointers ----
Percentages
[0, 0.3076923076923077, 0.21739130434782608, 0.14285714285714285, 0.35714285714285715, 0.4]
---------------------
Expected Values
[0, 0.9230769230769231, 0.6521739130434783, 0.42857142857142855, 1.0714285714285714, 1.2000000000000002]
---- 2 Pointers ----
Percentages
[0.56, 0.5454545454545454, 0.0, 0.4, 0.43478260869565216, 0.5833333333333334]
---------------------
Expected Values
[1.12, 1.0909090909090908, 0.0, 0.8, 0.8695652173913043, 1.1666666666666667]
---------------------------------------------------------------
Jonas Jerebko
---------------------
---- 3 Pointers ----
Percentages
[0, 0.4883720930232558, 0.4772727272727273, 0.3548387096774194, 0.3333333333333333, 0.2857142857142857]
---------------------
Expected Values
[0, 1.4651162790697674, 1.4318181818181819, 1.064516129032258, 1.0, 0.8571428571428571]
---- 2 Pointers ----
Percentages
[0.5424836601307189, 0.14285714285714285, 0.36363636363636365, 0.2, 0.75, 0.4]
---------------------
Expected Values
[1.0849673202614378, 0.2857142857142857, 0.7272727272727273, 0.4, 1.5, 0.8]
---------------------------------------------------------------
Raul Neto
---------------------
---- 3 Pointers ----
Percentages
[0, 0.5, 0.46153846153846156, 0.375, 0.38461538461538464, 0.2]
---------------------
Expected Values
[0, 1.5, 1.3846153846153846, 1.125, 1.153846153846154, 0.6000000000000001]
---- 2 Pointers ----
Percentages
[0.5232558139534884, 0.25, 0, 0.2857142857142857, 0.14285714285714285, 1.0]
---------------------
Expected Values
[1.0465116279069768, 0.5, 0, 0.5714285714285714, 0.2857142857142857, 2.0]
---------------------------------------------------------------
Ricky Rubio
---------------------
---- 3 Pointers ----
Percentages
[0, 0.2558139534883721, 0.40625, 0.39080459770114945, 0.42857142857142855, 0.23255813953488372]
---------------------
Expected Values
[0, 0.7674418604651163, 1.21875, 1.1724137931034484, 1.2857142857142856, 0.6976744186046512]
---- 2 Pointers ----
Percentages
[0.47346938775510206, 0.45454545454545453, 0.5, 0.40425531914893614, 0.41935483870967744, 0.4772727272727273]
---------------------
Expected Values
[0.9469387755102041, 0.9090909090909091, 1.0, 0.8085106382978723, 0.8387096774193549, 0.9545454545454546]
---------------------------------------------------------------
Rodney Hood
---------------------
---- 3 Pointers ----
Percentages
[0, 0.4074074074074074, 0.26666666666666666, 0.44086021505376344, 0.26666666666666666, 0.423728813559322]
---------------------
Expected Values
[0, 1.222222222222222, 0.8, 1.3225806451612903, 0.8, 1.271186440677966]
---- 2 Pointers ----
Percentages
[0.5220588235294118, 0.375, 0.5, 0.4897959183673469, 0.2653061224489796, 0.4583333333333333]
---------------------
Expected Values
[1.0441176470588236, 0.75, 1.0, 0.9795918367346939, 0.5306122448979592, 0.9166666666666666]
---------------------------------------------------------------
Royce O
---------------------
---- 3 Pointers ----
Percentages
[0, 0.34285714285714286, 0.3684210526315789, 0.4166666666666667, 0.15789473684210525, 0.4]
---------------------
Expected Values
[0, 1.0285714285714285, 1.1052631578947367, 1.25, 0.47368421052631576, 1.2000000000000002]
---- 2 Pointers ----
Percentages
[0.4689655172413793, 0, 0.0, 0.5833333333333334, 0.5, 0.6666666666666666]
---------------------
Expected Values
[0.9379310344827586, 0, 0.0, 1.1666666666666667, 1.0, 1.3333333333333333]
---------------------------------------------------------------
Rudy Gobert
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.636150234741784, 0, 0, 0.25, 0, 0.3333333333333333]
---------------------
Expected Values
[1.272300469483568, 0, 0, 0.5, 0, 0.6666666666666666]
---------------------------------------------------------------
Thabo Sefolosha
---------------------
---- 3 Pointers ----
Percentages
[0, 0.45, 0.42857142857142855, 0.35714285714285715, 0.23076923076923078, 0.3333333333333333]
---------------------
Expected Values
[0, 1.35, 1.2857142857142856, 1.0714285714285714, 0.6923076923076923, 1.0]
---- 2 Pointers ----
Percentages
[0.6120689655172413, 0.25, 0.625, 0.3684210526315789, 0.16666666666666666, 0.3333333333333333]
---------------------
Expected Values
[1.2241379310344827, 0.5, 1.25, 0.7368421052631579, 0.3333333333333333, 0.6666666666666666]
Statistical Significance
At low sampling rates, a given shot could have a high expected value purely by chance. For example, 25% of the time, a 50% shooter will make two shots in a row. If those two shots are the only sample we have, we might conclude that the shooter is a 100% shooter. For this reason, it is important to determine if results are statistically significant, or if they most likely occured by chance. Hypothesis testing is a good way to measure statistical significance. It involves formulating a null hypothesis that you would like to disprove, calculating the probability of a given result occurring if you were to assume that the null hypothesis is true, and rejecting the null hypothesis if that probability is sufficiently low.
Hypothesis Testing:
Player p-test:
- Take as the null hypothesis that the shooting percentage for a given shot is less than or equal to the average percentage for that player for threes or twos.
Team p-test:
- Take as the null hypothesis that the shooting percentage for a given shot is less than or equal to the average percentage for the whole team for threes or twos.
Location p-test:
- Take as the null hypothesis that the shooting percentage for a given shot is less than or equal to the average percentage for the whole team from that location.
shots_array = np.array([["shooter", "three", "cluster", 'num_shots', "num_makes", "pct", "expectedval",
"Player_p", "Team_p", "Location_p"]])
threemask = ShotsPD["ThreePt"] == 1
twomask = ShotsPD["ThreePt"] == 0
threePD = ShotsPD[threemask]
twoPD = ShotsPD[twomask]
for shooter in threePD["shooter_name"].unique():
mask = threePD["shooter_name"] == shooter
ShooterShots = threePD[mask]
for cluster in ShooterShots["LocationCluster"].unique():
mask2 = ShooterShots["LocationCluster"] == cluster
ClusterShots = ShooterShots[mask2]
num_shots = len(ClusterShots)
mask_made =ClusterShots["made/missed"] == 1
num_makes = len(ClusterShots[mask_made])
pct = num_makes/num_shots
expectedval = pct*3
# player p-value
avg_pct = len(ShooterShots[ShooterShots["made/missed"]==1])/len(ShooterShots)
# total 3 pt avg for this player
mu = num_shots*avg_pct # mean number of makes for this cluster assuming average shooting
sigma = sc.sqrt(mu*(1-avg_pct)) # standard deviation?
player_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
# team p-value
avg_pct = len(threePD[threePD["made/missed"]==1])/len(threePD) # total 3 pt avg for the team
mu = num_shots*avg_pct
sigma = sc.sqrt(mu*(1-avg_pct))
team_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
# location p-value
teamClusterPD = threePD[threePD["LocationCluster"] == cluster]
avg_pct = len(teamClusterPD[teamClusterPD["made/missed"]==1])/len(teamClusterPD)
# team 3 pt avg from this cluster
mu = num_shots*avg_pct
sigma = sc.sqrt(mu*(1-avg_pct))
location_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
shots_array = np.append(shots_array, [[shooter, 1, cluster, num_shots,
num_makes, pct, expectedval, player_p,
team_p, location_p]], axis=0)
for shooter in twoPD["shooter_name"].unique():
mask = twoPD["shooter_name"] == shooter
ShooterShots = twoPD[mask]
for cluster in ShooterShots["LocationCluster"].unique():
mask2 = ShooterShots["LocationCluster"] == cluster
ClusterShots = ShooterShots[mask2]
num_shots = len(ClusterShots)
mask_made =ClusterShots["made/missed"] == 1
num_makes = len(ClusterShots[mask_made])
pct = num_makes/num_shots
expectedval = pct*2
avg_pct = len(ShooterShots[ShooterShots["made/missed"]==1])/len(ShooterShots)
#total 3 pt avg for this player
mu = num_shots*avg_pct # mean number of makes for this cluster
sigma = sc.sqrt(mu*(1-avg_pct)) # standard deviation?
player_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
# team p-value
avg_pct = len(twoPD[twoPD["made/missed"]==1])/len(twoPD) # total 3 pt avg for the team
mu = num_shots*avg_pct
sigma = sc.sqrt(mu*(1-avg_pct))
team_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
# location p-value
teamClusterPD = twoPD[twoPD["LocationCluster"] == cluster]
avg_pct = len(teamClusterPD[teamClusterPD["made/missed"]==1])/len(teamClusterPD)
# team 3 pt avg from this cluster
mu = num_shots*avg_pct
sigma = sc.sqrt(mu*(1-avg_pct))
location_p = 1-norm.cdf(num_makes, loc=mu, scale=sigma)
shots_array = np.append(shots_array, [[shooter, 0, cluster, num_shots,
num_makes, pct, expectedval, player_p,
team_p, location_p]], axis=0)
/Users/averysmith/anaconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1732: RuntimeWarning: invalid value encountered in double_scalars
x = np.asarray((x - loc)/scale, dtype=dtyp)
NewLocationPD = pd.DataFrame(data=shots_array[1:],
columns=shots_array[0])
NewLocationPD["three"] = NewLocationPD["three"].map(int)
NewLocationPD["cluster"] = NewLocationPD["cluster"].map(int)
NewLocationPD["num_shots"] = NewLocationPD["num_shots"].map(int)
NewLocationPD["num_makes"] = NewLocationPD["num_makes"].map(int)
NewLocationPD["pct"] = NewLocationPD["pct"].map(float)
NewLocationPD["expectedval"] = NewLocationPD["expectedval"].map(float)
NewLocationPD["Player_p"] = NewLocationPD["Player_p"].map(float)
NewLocationPD["Team_p"] = NewLocationPD["Team_p"].map(float)
NewLocationPD["Location_p"] = NewLocationPD["Location_p"].map(float)
print(NewLocationPD.dtypes, '\n')
shooter object
three int64
cluster int64
num_shots int64
num_makes int64
pct float64
expectedval float64
Player_p float64
Team_p float64
Location_p float64
dtype: object
# print the 40 best expected value shots()
NewLocationPD.sort_values(by=["expectedval"], ascending=False).head(40)
shooter | three | cluster | num_shots | num_makes | pct | expectedval | Player_p | Team_p | Location_p | |
---|---|---|---|---|---|---|---|---|---|---|
68 | Rudy Gobert | 0 | 4 | 1 | 1 | 1.000000 | 2.000000 | 0.219013 | 1.654043e-01 | 0.084870 |
120 | Raul Neto | 0 | 5 | 2 | 2 | 1.000000 | 2.000000 | 0.070967 | 8.451872e-02 | 0.055618 |
127 | Royce O | 0 | 1 | 1 | 1 | 1.000000 | 2.000000 | 0.148750 | 1.654043e-01 | 0.124909 |
56 | Jae Crowder | 1 | 1 | 24 | 14 | 0.583333 | 1.750000 | 0.002547 | 1.302116e-02 | 0.032321 |
15 | Donovan Mitchell | 1 | 2 | 45 | 25 | 0.555556 | 1.666667 | 0.001011 | 3.902713e-03 | 0.017140 |
2 | Joe Ingles | 1 | 2 | 82 | 41 | 0.500000 | 1.500000 | 0.144763 | 5.447301e-03 | 0.033559 |
43 | Raul Neto | 1 | 1 | 6 | 3 | 0.500000 | 1.500000 | 0.308538 | 2.455023e-01 | 0.306089 |
133 | Jonas Jerebko | 0 | 4 | 4 | 3 | 0.750000 | 1.500000 | 0.166598 | 1.724358e-01 | 0.044999 |
53 | Jonas Jerebko | 1 | 1 | 43 | 21 | 0.488372 | 1.465116 | 0.170106 | 4.596406e-02 | 0.114789 |
51 | Jonas Jerebko | 1 | 2 | 44 | 21 | 0.477273 | 1.431818 | 0.207412 | 6.035049e-02 | 0.150673 |
76 | Donovan Mitchell | 0 | 1 | 10 | 7 | 0.700000 | 1.400000 | 0.099649 | 1.195645e-01 | 0.042443 |
4 | Joe Ingles | 1 | 5 | 56 | 26 | 0.464286 | 1.392857 | 0.368013 | 6.071480e-02 | 0.041625 |
41 | Raul Neto | 1 | 2 | 13 | 6 | 0.461538 | 1.384615 | 0.325306 | 2.340261e-01 | 0.327786 |
1 | Joe Ingles | 1 | 1 | 109 | 50 | 0.458716 | 1.376147 | 0.361958 | 2.067597e-02 | 0.100186 |
31 | Thabo Sefolosha | 1 | 1 | 20 | 9 | 0.450000 | 1.350000 | 0.267932 | 2.139309e-01 | 0.319572 |
60 | Derrick Favors | 0 | 0 | 479 | 320 | 0.668058 | 1.336117 | 0.000679 | 7.471357e-12 | 0.000002 |
126 | Royce O | 0 | 5 | 3 | 2 | 0.666667 | 1.333333 | 0.258235 | 2.983175e-01 | 0.215423 |
10 | Rodney Hood | 1 | 3 | 93 | 41 | 0.440860 | 1.322581 | 0.120899 | 6.343204e-02 | 0.042846 |
3 | Joe Ingles | 1 | 3 | 99 | 43 | 0.434343 | 1.303030 | 0.560276 | 7.488420e-02 | 0.050744 |
30 | Thabo Sefolosha | 1 | 2 | 28 | 12 | 0.428571 | 1.285714 | 0.308814 | 2.411689e-01 | 0.382603 |
27 | Ricky Rubio | 1 | 4 | 35 | 15 | 0.428571 | 1.285714 | 0.174564 | 2.160885e-01 | 0.081620 |
66 | Rudy Gobert | 0 | 0 | 426 | 271 | 0.636150 | 1.272300 | 0.308772 | 2.249937e-07 | 0.001403 |
13 | Rodney Hood | 1 | 5 | 59 | 25 | 0.423729 | 1.271186 | 0.254158 | 1.729582e-01 | 0.130013 |
48 | Royce O | 1 | 3 | 12 | 5 | 0.416667 | 1.250000 | 0.270146 | 3.541097e-01 | 0.329155 |
104 | Thabo Sefolosha | 0 | 2 | 8 | 5 | 0.625000 | 1.250000 | 0.329155 | 2.648511e-01 | 0.056156 |
100 | Thabo Sefolosha | 0 | 0 | 116 | 71 | 0.612069 | 1.224138 | 0.080125 | 1.723821e-02 | 0.150045 |
12 | Rodney Hood | 1 | 1 | 27 | 11 | 0.407407 | 1.222222 | 0.392461 | 3.222499e-01 | 0.463034 |
24 | Ricky Rubio | 1 | 2 | 64 | 26 | 0.406250 | 1.218750 | 0.186086 | 2.447323e-01 | 0.465276 |
6 | Joe Johnson | 1 | 5 | 5 | 2 | 0.400000 | 1.200000 | 0.253123 | 4.348063e-01 | 0.414141 |
38 | Alec Burks | 1 | 2 | 15 | 6 | 0.400000 | 1.200000 | 0.277314 | 3.880836e-01 | 0.502873 |
45 | Royce O | 1 | 5 | 10 | 4 | 0.400000 | 1.200000 | 0.327360 | 4.082131e-01 | 0.379516 |
26 | Ricky Rubio | 1 | 3 | 87 | 34 | 0.390805 | 1.172414 | 0.229947 | 3.062398e-01 | 0.246089 |
85 | Joe Johnson | 0 | 5 | 12 | 7 | 0.583333 | 1.166667 | 0.298303 | 3.152880e-01 | 0.160097 |
125 | Royce O | 0 | 3 | 12 | 7 | 0.583333 | 1.166667 | 0.235837 | 3.152880e-01 | 0.099837 |
37 | Alec Burks | 1 | 1 | 18 | 7 | 0.388889 | 1.166667 | 0.292241 | 4.154618e-01 | 0.533750 |
42 | Raul Neto | 1 | 4 | 13 | 5 | 0.384615 | 1.153846 | 0.545075 | 4.406020e-01 | 0.305157 |
40 | Raul Neto | 1 | 3 | 8 | 3 | 0.375000 | 1.125000 | 0.557383 | 4.757867e-01 | 0.454265 |
106 | Joe Ingles | 0 | 0 | 201 | 113 | 0.562189 | 1.124378 | 0.053589 | 8.558441e-02 | 0.524781 |
135 | Jae Crowder | 0 | 0 | 89 | 50 | 0.561798 | 1.123596 | 0.019917 | 1.832057e-01 | 0.519463 |
83 | Joe Johnson | 0 | 0 | 75 | 42 | 0.560000 | 1.120000 | 0.179038 | 2.124386e-01 | 0.530371 |
# identify the mean location for each of the six clusters identified through kMeans
# (ignoring two and three point differences)
left_coordinates = np.array([])
top_coordinates = np.array([])
clusters = np.arange(0, 6)
for cluster in clusters:
cluster_mask = ShotsPD['LocationCluster'] == cluster
cluster_df = ShotsPD[cluster_mask]
left_coord = cluster_df["left"].mean()
top_coord = cluster_df["top"].mean()
left_coordinates = np.append(left_coordinates, left_coord)
top_coordinates = np.append(top_coordinates, top_coord)
# plot the mean location of each cluster on the court for reference purposes
import seaborn as sns
df = pd.DataFrame({
'x': left_coordinates,
'y': top_coordinates,
'group': clusters
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left',
size='medium', color='black', weight='semibold')
plt.title('Clusters')
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
We choose a threshold for significance of p < .05 in order to reject the null hypothesis.
For each p-test we used, we filter the results for only the statistically significant shots.
The way to interpret these p-values is: if that location was not better than average (for the player or team as a whole), then the p-value represents the probability that the player would still, by coincidence, shoot as well from that location as they did.
Player p-test
statistically_significant_mask = NewLocationPD["Player_p"] < .05
SignificantPD = NewLocationPD[statistically_significant_mask]
SignificantPD.sort_values(by=["expectedval"], ascending=False)
shooter | three | cluster | num_shots | num_makes | pct | expectedval | Player_p | Team_p | Location_p | |
---|---|---|---|---|---|---|---|---|---|---|
56 | Jae Crowder | 1 | 1 | 24 | 14 | 0.583333 | 1.750000 | 0.002547 | 1.302116e-02 | 0.032321 |
15 | Donovan Mitchell | 1 | 2 | 45 | 25 | 0.555556 | 1.666667 | 0.001011 | 3.902713e-03 | 0.017140 |
60 | Derrick Favors | 0 | 0 | 479 | 320 | 0.668058 | 1.336117 | 0.000679 | 7.471357e-12 | 0.000002 |
135 | Jae Crowder | 0 | 0 | 89 | 50 | 0.561798 | 1.123596 | 0.019917 | 1.832057e-01 | 0.519463 |
71 | Donovan Mitchell | 0 | 0 | 599 | 321 | 0.535893 | 1.071786 | 0.028644 | 1.412531e-01 | 0.920028 |
This indicates that Jae Crowder shoots significantly better from the right corner than he does from any other three point location, and Donovan Mitchell shoots significantly better from the left corner than he does from any other three point location. This information could help coaches modify the offense so that Crowder and Mitchell spend more time on the right and left side respectively.
Unsurprisingly, we found that Derrick Favors, Jae Crowder, and Donovan Mitchell all shoot better from the post than they do from any other two-point position. That is unsurprising because the post is much closer to the basket than other locations, and is expected to have a better shooting percentage.
statistically_significant_mask = NewLocationPD["Team_p"] < .05
TeamSignificantPD = NewLocationPD[statistically_significant_mask]
TeamSignificantPD.sort_values(by=["expectedval"], ascending=False)
shooter | three | cluster | num_shots | num_makes | pct | expectedval | Player_p | Team_p | Location_p | |
---|---|---|---|---|---|---|---|---|---|---|
56 | Jae Crowder | 1 | 1 | 24 | 14 | 0.583333 | 1.750000 | 0.002547 | 1.302116e-02 | 0.032321 |
15 | Donovan Mitchell | 1 | 2 | 45 | 25 | 0.555556 | 1.666667 | 0.001011 | 3.902713e-03 | 0.017140 |
2 | Joe Ingles | 1 | 2 | 82 | 41 | 0.500000 | 1.500000 | 0.144763 | 5.447301e-03 | 0.033559 |
53 | Jonas Jerebko | 1 | 1 | 43 | 21 | 0.488372 | 1.465116 | 0.170106 | 4.596406e-02 | 0.114789 |
1 | Joe Ingles | 1 | 1 | 109 | 50 | 0.458716 | 1.376147 | 0.361958 | 2.067597e-02 | 0.100186 |
60 | Derrick Favors | 0 | 0 | 479 | 320 | 0.668058 | 1.336117 | 0.000679 | 7.471357e-12 | 0.000002 |
66 | Rudy Gobert | 0 | 0 | 426 | 271 | 0.636150 | 1.272300 | 0.308772 | 2.249937e-07 | 0.001403 |
100 | Thabo Sefolosha | 0 | 0 | 116 | 71 | 0.612069 | 1.224138 | 0.080125 | 1.723821e-02 | 0.150045 |
This indicates that the three point shots noted above for Jae Crowder and Donovan Mitchell are also significantly better than the team average for three point shots. Joe Ingles (an excellent three point shooter) also shoots significantly better from both corners than the team three point average (although he does not shoot significantly better from the corners than he does from the other three point spots because his overall three point shooting percentage is so high). Jonas Jerebko also shoots significantly better from the right corner than the team average for three pointers.
For two pointers, we now find that Derrick Favors, Rudy Gobert, and Thabo Sefolosha all shoot significantly better from the post than the team average for two pointers. The p-values for Derrick Favors and Rudy Gobert are extremely small, partially due to the large number of shots taken by both players from that location (>400). Rudy Gobert likely didn’t show up on the previous list because such a high percentage of his shots are taken from the post that two point percentage is effectively the same as his shooting percentage from the post.
statistically_significant_mask = NewLocationPD["Location_p"] < .05
LocationSignificantPD = NewLocationPD[statistically_significant_mask]
LocationSignificantPD.sort_values(by=["expectedval"], ascending=False)
shooter | three | cluster | num_shots | num_makes | pct | expectedval | Player_p | Team_p | Location_p | |
---|---|---|---|---|---|---|---|---|---|---|
56 | Jae Crowder | 1 | 1 | 24 | 14 | 0.583333 | 1.750000 | 0.002547 | 1.302116e-02 | 0.032321 |
15 | Donovan Mitchell | 1 | 2 | 45 | 25 | 0.555556 | 1.666667 | 0.001011 | 3.902713e-03 | 0.017140 |
2 | Joe Ingles | 1 | 2 | 82 | 41 | 0.500000 | 1.500000 | 0.144763 | 5.447301e-03 | 0.033559 |
133 | Jonas Jerebko | 0 | 4 | 4 | 3 | 0.750000 | 1.500000 | 0.166598 | 1.724358e-01 | 0.044999 |
76 | Donovan Mitchell | 0 | 1 | 10 | 7 | 0.700000 | 1.400000 | 0.099649 | 1.195645e-01 | 0.042443 |
4 | Joe Ingles | 1 | 5 | 56 | 26 | 0.464286 | 1.392857 | 0.368013 | 6.071480e-02 | 0.041625 |
60 | Derrick Favors | 0 | 0 | 479 | 320 | 0.668058 | 1.336117 | 0.000679 | 7.471357e-12 | 0.000002 |
10 | Rodney Hood | 1 | 3 | 93 | 41 | 0.440860 | 1.322581 | 0.120899 | 6.343204e-02 | 0.042846 |
66 | Rudy Gobert | 0 | 0 | 426 | 271 | 0.636150 | 1.272300 | 0.308772 | 2.249937e-07 | 0.001403 |
The last hypothesis tested was whether certain players shot much better than the team average for that specific location. Apart from identifying some of the same shots as above, this test would be expected to ffind some players who shoot exceptionally well from more difficult spots.
Jonas Jerebko shooting two pointers from the right wing, Donovan Mitchell shooting two pointers from the right corner, Joe Ingles shooting from the top of the three-point arc, and Rodney Hood shooting three pointers from the left wing would all appear to fit in this category, although each falls just at the limits of statistical significance (.4 < p < .5).
Conclusions
This information can help coaches and decision-makers design offensive sets and plays, and can help players with shot-selection.
For example, for the first p-test, we would advise Donovan Mitchell to take more three point shots from the left corner, and Jae Crowder to take more from the right corner. Coaches could design their offense so both players spend more time on those sides. We would also advise Donovan Mitchell to drive all the way to the basket when taking a two-point shot.
For the second p-test, we would advise the coaches to develop their offense to maximize corner threes by Crowder, Mitchell, Ingles, and Jerebko, and post shots by Favors and Gobert.
Some additional Analysis
Home and Away Differences
- We wanted to explore how the expected values for each player differ in home games vs away games. Once agian, we used simple masking to compare how players shoot home and away
## HOME
# 3 pointers
PlayerIDs = np.unique(ShotsPD['shooter'])
PlayerNames = np.unique(ShotsPD['shooter_name'])
NumOPlayers = len(PlayerNames)
for Name in range(0,NumOPlayers):
PercMadeDif3 = []
NumShots = []
NumMade = []
ExpectedValue3 =[]
AvLeft3 = []
AvTop3 =[]
PtVal3 = 3
PtVal2 = 2
PercMadeDif2 = []
NumShots3 = []
NumShots2 = []
NumMade3 = []
NumMade2 = []
ExpectedValue2 =[]
ExpectedValueRound = []
AvLeft2 = []
AvTop2 =[]
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==1)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])& ( ShotsPD['home/away'] == 0) ]
NumShots3.append(len(Location['made/missed']))
NumMade3.append(len(Location[Location['made/missed']==1]))
PercMade3 = 0
if NumShots3[i] > 1:
PercMade3 = NumMade3[i] / NumShots3[i]
PercMadeDif3.append(PercMade3)
else:
PercMadeDif3.append(0)
ExpectedValue3.append(PercMadeDif3[i]*PtVal3)
AvLeft3.append(np.mean(Location['left']))
AvTop3.append(np.mean(Location['top']))
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==0)
& ( ShotsPD['shooter_name'] == PlayerNames[Name]) & ( ShotsPD['home/away'] == 0) ]
NumShots2.append(len(Location['made/missed']))
NumMade2.append(len(Location[Location['made/missed']==1]))
PercMade2 = 0
if NumShots2[i] > 1:
PercMade2 = NumMade2[i] / NumShots2[i]
PercMadeDif2.append(PercMade2)
else:
PercMadeDif2.append(0)
ExpectedValue2.append(PercMadeDif2[i]*PtVal2)
AvLeft2.append(np.mean(Location['left']))
AvTop2.append(np.mean(Location['top']))
print('---------------------------------------------------------------')
print(PlayerNames[Name])
print('---------------------')
print('---- 3 Pointers ----')
print('Percentages')
print(PercMadeDif3)
print('---------------------')
print('Expected Values')
print(ExpectedValue3)
print('---- 2 Pointers ----')
print('Percentages')
print(PercMadeDif2)
print('---------------------')
print('Expected Values')
print(ExpectedValue2)
for mm in range(0,len(ExpectedValue3)):
if ExpectedValue3[mm] == 0:
Extra = mm
del ExpectedValue3[Extra]
xx = np.isnan(AvLeft3)
for mm in range(0, len(AvLeft3)):
if xx[mm] == True:
DeleteVar = mm
del AvLeft3[DeleteVar]
del AvTop3[DeleteVar]
ExpectedValue = ExpectedValue3 + ExpectedValue2
ExpectedValueRound = np.round_(ExpectedValue, decimals=2)
AvLefts = AvLeft3 + AvLeft2
AvTops = AvTop3 + AvTop2
for u in range(0,len(AvLefts)):
if math.isnan(AvLefts[u]):
AvLefts[u]=90
if math.isnan(AvTops[u]):
AvTops[u]= 0
x = np.round(AvLefts,decimals=0)
y = np.round(AvTops,decimals=0)
valz = [str(ExpectedValueRound[0]),str(ExpectedValueRound[1]),str(ExpectedValueRound[2]),
str(ExpectedValueRound[3]),str(ExpectedValueRound[4]),str(ExpectedValueRound[5]),
str(ExpectedValueRound[6]),str(ExpectedValueRound[7]),str(ExpectedValueRound[8]),
str(ExpectedValueRound[9]),str(ExpectedValueRound[10])]
df = pd.DataFrame({
'x': x,
'y': y,
'group': valz
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left',
size='medium', color='black', weight='semibold')
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
---------------------------------------------------------------
Alec Burks
---------------------
---- 3 Pointers ----
Percentages
[0, 0.2222222222222222, 0.2857142857142857, 0.2631578947368421, 0.17647058823529413, 0.3333333333333333]
---------------------
Expected Values
[0, 0.6666666666666666, 0.8571428571428571, 0.7894736842105263, 0.5294117647058824, 1.0]
---- 2 Pointers ----
Percentages
[0.40540540540540543, 0, 0, 0.2, 0.47368421052631576, 0.38461538461538464]
---------------------
Expected Values
[0.8108108108108109, 0, 0, 0.4, 0.9473684210526315, 0.7692307692307693]
---------------------------------------------------------------
Derrick Favors
---------------------
---- 3 Pointers ----
Percentages
[0, 0.38461538461538464, 0.2, 0.0, 0, 0.5]
---------------------
Expected Values
[0, 1.153846153846154, 0.6000000000000001, 0.0, 0, 1.5]
---- 2 Pointers ----
Percentages
[0.6329113924050633, 0.2857142857142857, 0.14285714285714285, 0.2857142857142857, 0.46153846153846156, 0.42857142857142855]
---------------------
Expected Values
[1.2658227848101267, 0.5714285714285714, 0.2857142857142857, 0.5714285714285714, 0.9230769230769231, 0.8571428571428571]
---------------------------------------------------------------
Donovan Mitchell
---------------------
---- 3 Pointers ----
Percentages
[0, 0.3793103448275862, 0.47368421052631576, 0.2835820895522388, 0.3717948717948718, 0.3584905660377358]
---------------------
Expected Values
[0, 1.1379310344827585, 1.4210526315789473, 0.8507462686567164, 1.1153846153846154, 1.0754716981132075]
---- 2 Pointers ----
Percentages
[0.5144694533762058, 0.5, 0.5, 0.3888888888888889, 0.36585365853658536, 0.4444444444444444]
---------------------
Expected Values
[1.0289389067524115, 1.0, 1.0, 0.7777777777777778, 0.7317073170731707, 0.8888888888888888]
---------------------------------------------------------------
Ekpe Udoh
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.515625, 0, 0, 0, 0, 0]
---------------------
Expected Values
[1.03125, 0, 0, 0, 0, 0]
---------------------------------------------------------------
Jae Crowder
---------------------
---- 3 Pointers ----
Percentages
[0, 0.8, 0.42857142857142855, 0.3, 0.23076923076923078, 0.2222222222222222]
---------------------
Expected Values
[0, 2.4000000000000004, 1.2857142857142856, 0.8999999999999999, 0.6923076923076923, 0.6666666666666666]
---- 2 Pointers ----
Percentages
[0.6129032258064516, 0.5, 0.4, 0.3333333333333333, 0.16666666666666666, 0.25]
---------------------
Expected Values
[1.2258064516129032, 1.0, 0.8, 0.6666666666666666, 0.3333333333333333, 0.5]
---------------------------------------------------------------
Joe Ingles
---------------------
---- 3 Pointers ----
Percentages
[0, 0.425531914893617, 0.5135135135135135, 0.5098039215686274, 0.3902439024390244, 0.5333333333333333]
---------------------
Expected Values
[0, 1.2765957446808511, 1.5405405405405403, 1.5294117647058822, 1.1707317073170733, 1.6]
---- 2 Pointers ----
Percentages
[0.5670103092783505, 0.6, 0, 0.38461538461538464, 0.3333333333333333, 0.5]
---------------------
Expected Values
[1.134020618556701, 1.2, 0, 0.7692307692307693, 0.6666666666666666, 1.0]
---------------------------------------------------------------
Joe Johnson
---------------------
---- 3 Pointers ----
Percentages
[0, 0.2, 0.15789473684210525, 0.0, 0.4444444444444444, 0.0]
---------------------
Expected Values
[0, 0.6000000000000001, 0.47368421052631576, 0.0, 1.3333333333333333, 0.0]
---- 2 Pointers ----
Percentages
[0.48936170212765956, 0.5555555555555556, 0, 0.4, 0.375, 0.5]
---------------------
Expected Values
[0.9787234042553191, 1.1111111111111112, 0, 0.8, 0.75, 1.0]
---------------------------------------------------------------
Jonas Jerebko
---------------------
---- 3 Pointers ----
Percentages
[0, 0.35294117647058826, 0.42857142857142855, 0.4, 0.25, 0.375]
---------------------
Expected Values
[0, 1.0588235294117647, 1.2857142857142856, 1.2000000000000002, 0.75, 1.125]
---- 2 Pointers ----
Percentages
[0.5487804878048781, 0.3333333333333333, 0.5, 0.0, 0, 0.25]
---------------------
Expected Values
[1.0975609756097562, 0.6666666666666666, 1.0, 0.0, 0, 0.5]
---------------------------------------------------------------
Raul Neto
---------------------
---- 3 Pointers ----
Percentages
[0, 0.3333333333333333, 0.42857142857142855, 0.375, 0.5, 0.0]
---------------------
Expected Values
[0, 1.0, 1.2857142857142856, 1.125, 1.5, 0.0]
---- 2 Pointers ----
Percentages
[0.5192307692307693, 0.0, 0, 0.6666666666666666, 0.25, 0]
---------------------
Expected Values
[1.0384615384615385, 0.0, 0, 1.3333333333333333, 0.5, 0]
---------------------------------------------------------------
Ricky Rubio
---------------------
---- 3 Pointers ----
Percentages
[0, 0.23529411764705882, 0.45454545454545453, 0.4523809523809524, 0.5882352941176471, 0.25]
---------------------
Expected Values
[0, 0.7058823529411764, 1.3636363636363635, 1.3571428571428572, 1.7647058823529411, 0.75]
---- 2 Pointers ----
Percentages
[0.49295774647887325, 0.5, 0.25, 0.43209876543209874, 0.3125, 0.4782608695652174]
---------------------
Expected Values
[0.9859154929577465, 1.0, 0.5, 0.8641975308641975, 0.625, 0.9565217391304348]
---------------------------------------------------------------
Rodney Hood
---------------------
---- 3 Pointers ----
Percentages
[0, 0.35714285714285715, 0.1875, 0.4642857142857143, 0.32, 0.5161290322580645]
---------------------
Expected Values
[0, 1.0714285714285714, 0.5625, 1.3928571428571428, 0.96, 1.5483870967741935]
---- 2 Pointers ----
Percentages
[0.5581395348837209, 0.5, 0.3333333333333333, 0.5454545454545454, 0.3076923076923077, 0.5714285714285714]
---------------------
Expected Values
[1.1162790697674418, 1.0, 0.6666666666666666, 1.0909090909090908, 0.6153846153846154, 1.1428571428571428]
---------------------------------------------------------------
Royce O
---------------------
---- 3 Pointers ----
Percentages
[0, 0.45, 0.375, 0.16666666666666666, 0.13333333333333333, 0.42857142857142855]
---------------------
Expected Values
[0, 1.35, 1.125, 0.5, 0.4, 1.2857142857142856]
---- 2 Pointers ----
Percentages
[0.5131578947368421, 0, 0.0, 0.5714285714285714, 0.3333333333333333, 0]
---------------------
Expected Values
[1.0263157894736843, 0, 0.0, 1.1428571428571428, 0.6666666666666666, 0]
---------------------------------------------------------------
Rudy Gobert
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.5555555555555556, 0, 0, 0.0, 0, 0.5]
---------------------
Expected Values
[1.1111111111111112, 0, 0, 0.0, 0, 1.0]
---------------------------------------------------------------
Thabo Sefolosha
---------------------
---- 3 Pointers ----
Percentages
[0, 0.45454545454545453, 0.5, 0.6, 0.0, 1.0]
---------------------
Expected Values
[0, 1.3636363636363635, 1.5, 1.7999999999999998, 0.0, 3.0]
---- 2 Pointers ----
Percentages
[0.6226415094339622, 0.5, 0.75, 0.38461538461538464, 0.0, 0.0]
---------------------
Expected Values
[1.2452830188679245, 1.0, 1.5, 0.7692307692307693, 0.0, 0.0]
## Away
# 3 pointers
PlayerIDs = np.unique(ShotsPD['shooter'])
PlayerNames = np.unique(ShotsPD['shooter_name'])
NumOPlayers = len(PlayerNames)
for Name in range(0,NumOPlayers):
PercMadeDif3 = []
NumShots = []
NumMade = []
ExpectedValue3 =[]
AvLeft3 = []
AvTop3 =[]
PtVal3 = 3
PtVal2 = 2
PercMadeDif2 = []
NumShots3 = []
NumShots2 = []
NumMade3 = []
NumMade2 = []
ExpectedValue2 =[]
ExpectedValueRound = []
AvLeft2 = []
AvTop2 =[]
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==1)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])& ( ShotsPD['home/away'] == 1) ]
NumShots3.append(len(Location['made/missed']))
NumMade3.append(len(Location[Location['made/missed']==1]))
PercMade3 = 0
if NumShots3[i] > 1:
PercMade3 = NumMade3[i] / NumShots3[i]
PercMadeDif3.append(PercMade3)
else:
PercMadeDif3.append(0)
ExpectedValue3.append(PercMadeDif3[i]*PtVal3)
AvLeft3.append(np.mean(Location['left']))
AvTop3.append(np.mean(Location['top']))
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==0)
& ( ShotsPD['shooter_name'] == PlayerNames[Name]) & ( ShotsPD['home/away'] == 1) ]
NumShots2.append(len(Location['made/missed']))
NumMade2.append(len(Location[Location['made/missed']==1]))
PercMade2 = 0
if NumShots2[i] > 1:
PercMade2 = NumMade2[i] / NumShots2[i]
PercMadeDif2.append(PercMade2)
else:
PercMadeDif2.append(0)
ExpectedValue2.append(PercMadeDif2[i]*PtVal2)
AvLeft2.append(np.mean(Location['left']))
AvTop2.append(np.mean(Location['top']))
print('---------------------------------------------------------------')
print(PlayerNames[Name])
print('---------------------')
print('---- 3 Pointers ----')
print('Percentages')
print(PercMadeDif3)
print('---------------------')
print('Expected Values')
print(ExpectedValue3)
print('---- 2 Pointers ----')
print('Percentages')
print(PercMadeDif2)
print('---------------------')
print('Expected Values')
print(ExpectedValue2)
for mm in range(0,len(ExpectedValue3)):
if ExpectedValue3[mm] == 0:
Extra = mm
del ExpectedValue3[Extra]
xx = np.isnan(AvLeft3)
for mm in range(0, len(AvLeft3)):
if xx[mm] == True:
DeleteVar = mm
del AvLeft3[DeleteVar]
del AvTop3[DeleteVar]
ExpectedValue = ExpectedValue3 + ExpectedValue2
ExpectedValueRound = np.round_(ExpectedValue, decimals=2)
AvLefts = AvLeft3 + AvLeft2
AvTops = AvTop3 + AvTop2
for u in range(0,len(AvLefts)):
if math.isnan(AvLefts[u]):
AvLefts[u]=90
if math.isnan(AvTops[u]):
AvTops[u]= 0
x = np.round(AvLefts,decimals=0)
y = np.round(AvTops,decimals=0)
valz = [str(ExpectedValueRound[0]),str(ExpectedValueRound[1]),str(ExpectedValueRound[2]),
str(ExpectedValueRound[3]),str(ExpectedValueRound[4]),str(ExpectedValueRound[5]),
str(ExpectedValueRound[6]),str(ExpectedValueRound[7]),str(ExpectedValueRound[8]),
str(ExpectedValueRound[9]),str(ExpectedValueRound[10])]
df = pd.DataFrame({
'x': x,
'y': y,
'group': valz
})
p1=sns.regplot(data=df, x="x", y="y", fit_reg=False, marker="o", color="skyblue", scatter_kws={'s':400})
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left',
size='medium', color='black', weight='semibold')
plt.imshow(img, zorder=0, extent=[0, 100, 0, 100.0])
plt.grid(False)
plt.show()
---------------------------------------------------------------
Alec Burks
---------------------
---- 3 Pointers ----
Percentages
[0, 0.5555555555555556, 0.5, 0.5, 0.3076923076923077, 0.2857142857142857]
---------------------
Expected Values
[0, 1.6666666666666667, 1.5, 1.5, 0.9230769230769231, 0.8571428571428571]
---- 2 Pointers ----
Percentages
[0.5875, 0, 0, 0.5625, 0.2857142857142857, 0.3333333333333333]
---------------------
Expected Values
[1.175, 0, 0, 1.125, 0.5714285714285714, 0.6666666666666666]
---------------------------------------------------------------
Derrick Favors
---------------------
---- 3 Pointers ----
Percentages
[0, 0.17647058823529413, 0.18181818181818182, 0, 0, 0]
---------------------
Expected Values
[0, 0.5294117647058824, 0.5454545454545454, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.7024793388429752, 0.375, 1.0, 0.2, 0.3125, 0.46511627906976744]
---------------------
Expected Values
[1.4049586776859504, 0.75, 2.0, 0.4, 0.625, 0.9302325581395349]
---------------------------------------------------------------
Donovan Mitchell
---------------------
---- 3 Pointers ----
Percentages
[0, 0.358974358974359, 0.6153846153846154, 0.2619047619047619, 0.27941176470588236, 0.30612244897959184]
---------------------
Expected Values
[0, 1.0769230769230769, 1.8461538461538463, 0.7857142857142858, 0.8382352941176471, 0.9183673469387755]
---- 2 Pointers ----
Percentages
[0.5590277777777778, 0.75, 0.3333333333333333, 0.5121951219512195, 0.28888888888888886, 0.37209302325581395]
---------------------
Expected Values
[1.1180555555555556, 1.5, 0.6666666666666666, 1.024390243902439, 0.5777777777777777, 0.7441860465116279]
---------------------------------------------------------------
Ekpe Udoh
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.5625, 0, 0, 0, 0.0, 0]
---------------------
Expected Values
[1.125, 0, 0, 0, 0.0, 0]
---------------------------------------------------------------
Jae Crowder
---------------------
---- 3 Pointers ----
Percentages
[0, 0.42857142857142855, 0.058823529411764705, 0.23529411764705882, 0.2727272727272727, 0.4444444444444444]
---------------------
Expected Values
[0, 1.2857142857142856, 0.1764705882352941, 0.7058823529411764, 0.8181818181818181, 1.3333333333333333]
---- 2 Pointers ----
Percentages
[0.5344827586206896, 0.5, 0.0, 0.2857142857142857, 0.2222222222222222, 0.4444444444444444]
---------------------
Expected Values
[1.0689655172413792, 1.0, 0.0, 0.5714285714285714, 0.4444444444444444, 0.8888888888888888]
---------------------------------------------------------------
Joe Ingles
---------------------
---- 3 Pointers ----
Percentages
[0, 0.4838709677419355, 0.4888888888888889, 0.3541666666666667, 0.34615384615384615, 0.38461538461538464]
---------------------
Expected Values
[0, 1.4516129032258065, 1.4666666666666666, 1.0625, 1.0384615384615383, 1.153846153846154]
---- 2 Pointers ----
Percentages
[0.5576923076923077, 0.4, 0.0, 0.3, 0.2727272727272727, 0.42857142857142855]
---------------------
Expected Values
[1.1153846153846154, 0.8, 0.0, 0.6, 0.5454545454545454, 0.8571428571428571]
---------------------------------------------------------------
Joe Johnson
---------------------
---- 3 Pointers ----
Percentages
[0, 0.45454545454545453, 0.5, 0.2222222222222222, 0.2, 0.6666666666666666]
---------------------
Expected Values
[0, 1.3636363636363635, 1.5, 0.6666666666666666, 0.6000000000000001, 2.0]
---- 2 Pointers ----
Percentages
[0.6785714285714286, 0.5, 0.0, 0.4, 0.5714285714285714, 0.6666666666666666]
---------------------
Expected Values
[1.3571428571428572, 1.0, 0.0, 0.8, 1.1428571428571428, 1.3333333333333333]
---------------------------------------------------------------
Jonas Jerebko
---------------------
---- 3 Pointers ----
Percentages
[0, 0.5769230769230769, 0.5217391304347826, 0.3125, 0.4166666666666667, 0.16666666666666666]
---------------------
Expected Values
[0, 1.7307692307692306, 1.5652173913043477, 0.9375, 1.25, 0.5]
---- 2 Pointers ----
Percentages
[0.5352112676056338, 0.0, 0.0, 0.3333333333333333, 1.0, 0]
---------------------
Expected Values
[1.0704225352112675, 0.0, 0.0, 0.6666666666666666, 2.0, 0]
---------------------------------------------------------------
Raul Neto
---------------------
---- 3 Pointers ----
Percentages
[0, 0.6666666666666666, 0.5, 0, 0.3333333333333333, 0.3333333333333333]
---------------------
Expected Values
[0, 2.0, 1.5, 0, 1.0, 1.0]
---- 2 Pointers ----
Percentages
[0.5294117647058824, 0.5, 0, 0.0, 0.0, 0]
---------------------
Expected Values
[1.0588235294117647, 1.0, 0, 0.0, 0.0, 0]
---------------------------------------------------------------
Ricky Rubio
---------------------
---- 3 Pointers ----
Percentages
[0, 0.2692307692307692, 0.3548387096774194, 0.3333333333333333, 0.2777777777777778, 0.21739130434782608]
---------------------
Expected Values
[0, 0.8076923076923077, 1.064516129032258, 1.0, 0.8333333333333334, 0.6521739130434783]
---- 2 Pointers ----
Percentages
[0.44660194174757284, 0.42857142857142855, 0.75, 0.36666666666666664, 0.5333333333333333, 0.47619047619047616]
---------------------
Expected Values
[0.8932038834951457, 0.8571428571428571, 1.5, 0.7333333333333333, 1.0666666666666667, 0.9523809523809523]
---------------------------------------------------------------
Rodney Hood
---------------------
---- 3 Pointers ----
Percentages
[0, 0.46153846153846156, 0.35714285714285715, 0.40540540540540543, 0.2, 0.32142857142857145]
---------------------
Expected Values
[0, 1.3846153846153846, 1.0714285714285714, 1.2162162162162162, 0.6000000000000001, 0.9642857142857144]
---- 2 Pointers ----
Percentages
[0.46, 0.25, 1.0, 0.4444444444444444, 0.21739130434782608, 0.37037037037037035]
---------------------
Expected Values
[0.92, 0.5, 2.0, 0.8888888888888888, 0.43478260869565216, 0.7407407407407407]
---------------------------------------------------------------
Royce O
---------------------
---- 3 Pointers ----
Percentages
[0, 0.2, 0.36363636363636365, 0.6666666666666666, 0.25, 0.3333333333333333]
---------------------
Expected Values
[0, 0.6000000000000001, 1.0909090909090908, 2.0, 0.75, 1.0]
---- 2 Pointers ----
Percentages
[0.42028985507246375, 0, 0, 0.6, 0.6, 0.5]
---------------------
Expected Values
[0.8405797101449275, 0, 0, 1.2, 1.2, 1.0]
---------------------------------------------------------------
Rudy Gobert
---------------------
---- 3 Pointers ----
Percentages
[0, 0, 0, 0, 0, 0]
---------------------
Expected Values
[0, 0, 0, 0, 0, 0]
---- 2 Pointers ----
Percentages
[0.706140350877193, 0, 0, 0.5, 0, 0.25]
---------------------
Expected Values
[1.412280701754386, 0, 0, 1.0, 0, 0.5]
---------------------------------------------------------------
Thabo Sefolosha
---------------------
---- 3 Pointers ----
Percentages
[0, 0.4444444444444444, 0.35714285714285715, 0.2222222222222222, 0.2727272727272727, 0.0]
---------------------
Expected Values
[0, 1.3333333333333333, 1.0714285714285714, 0.6666666666666666, 0.8181818181818181, 0.0]
---- 2 Pointers ----
Percentages
[0.6031746031746031, 0.0, 0.5, 0.3333333333333333, 0.25, 0.6666666666666666]
---------------------
Expected Values
[1.2063492063492063, 0.0, 1.0, 0.6666666666666666, 0.5, 1.3333333333333333]
** Conclusions**
From this, we can see that some players shoot different shots at much different expected valeus based on whether they are home or away. This could come from that players maybe have more nerves at away games and shoot worse altogher, or maybe they are more comfortable with certain courts and stadiums. In the comparison below, we can see that Ricky Rubio is a much better 3-point shooter at home. But interestingly enough, he shoots that baseline jumper 3x better away than at home. It is intersting to think players can shoot better or worse just depending on whether it is a home or an away game
Score Prediction
Finally, we were able to look at predicting a game based on the knowledge of the shot attempts to guess the team score and individual scores. We actually did quite well by looking at the boxscore listed below. This was the first game of the season. We over predicted Donavan Mitchel’s score, probbly as this was his first game in the NBA, and he may have shot worse due to nerves and getting used to the flow. Alec Burks was playing better at the time, so he actually did better than we predicted. We were able to decently predict a score based upon the expected values we found. However, our methods don’t take into account free throws, so our final scores will be a little off depending on free throws.
We were able to be within 7 points for each player. The model predicted Joe Johnson’s score perfectly while we were 7 off of Alec Burks actual total. We were 8 short of the team total.
GameOfInterest = 1
GameX = ShotsPD[ShotsPD['game'] == GameOfInterest]
# Predicting a game
# 3 pointers
PlayerIDs = np.unique(GameX['shooter'])
PlayerNames = np.unique(GameX['shooter_name'])
NumOPlayers = len(PlayerNames)
TotalPts = []
PlayerToalPts = []
PlayerToalPtsActual = []
TotalPtsActual = []
TotalTotalPts = []
for Name in range(0,NumOPlayers):
PercMadeDif3 = []
NumShots = []
NumMade = []
ExpectedValue3 =[]
PtVal3 = 3
PtVal2 = 2
PercMadeDif2 = []
NumShots3 = []
NumShots2 = []
NumMade3 = []
NumMade2 = []
ExpectedValue2 =[]
ExpectedValueRound = []
NumShotsRecreate3 = []
NumShotsRecreate2 = []
LocationRecreate2 = []
LocationRecreate3 = []
ExpectedPoints3Recreate = []
ExpectedPoints2Recreate= []
PlayerToalPts = []
NumShotsRecreate3Actual = []
NumShotsRecreate2Actual = []
LocationRecreate2Actual = []
LocationRecreate3Actual = []
ExpectedPoints3RecreateActual = []
ExpectedPoints2RecreateActual= []
PlayerToalPts = []
for i in range(0,6):
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==1)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])]
NumShots3.append(len(Location['made/missed']))
NumMade3.append(len(Location[Location['made/missed']==1]))
PercMade3 = 0
if NumShots3[i] > 1:
PercMade3 = NumMade3[i] / NumShots3[i]
PercMadeDif3.append(PercMade3)
else:
PercMadeDif3.append(0)
ExpectedValue3.append(PercMadeDif3[i]*PtVal3)
AvLeft3.append(np.mean(Location['left']))
AvTop3.append(np.mean(Location['top']))
Location = ShotsPD[(ShotsPD['LocationCluster']==i) & (ShotsPD['ThreePt']==0)
& ( ShotsPD['shooter_name'] == PlayerNames[Name])]
NumShots2.append(len(Location['made/missed']))
NumMade2.append(len(Location[Location['made/missed']==1]))
PercMade2 = 0
if NumShots2[i] > 1:
PercMade2 = NumMade2[i] / NumShots2[i]
PercMadeDif2.append(PercMade2)
else:
PercMadeDif2.append(0)
ExpectedValue2.append(PercMadeDif2[i]*PtVal2)
AvLeft2.append(np.mean(Location['left']))
AvTop2.append(np.mean(Location['top']))
for i in range(0,6):
LocationRecreate3 = GameX[(GameX['LocationCluster']==i)& (GameX['ThreePt']==1)
& ( GameX['shooter_name'] == PlayerNames[Name])]
NumShotsRecreate3.append(len(LocationRecreate3['made/missed']))
ExpectedPoints3Recreate.append(NumShotsRecreate3[i]*ExpectedValue3[i])
for i in range(0,6):
LocationRecreate2 = GameX[(GameX['LocationCluster']==i)& (GameX['ThreePt']==0)
& ( GameX['shooter_name'] == PlayerNames[Name])]
NumShotsRecreate2.append(len(LocationRecreate2['made/missed']))
ExpectedPoints2Recreate.append(NumShotsRecreate2[i]*ExpectedValue2[i])
Player3pts = np.sum(ExpectedPoints3Recreate)
Player2pts = np.sum(ExpectedPoints2Recreate)
PlayerToalPts = Player3pts + Player2pts
TotalPts.append(int(PlayerToalPts))
## Actual Results of the Game
for i in range(0,6):
LocationRecreate3Actual = GameX[(GameX['LocationCluster']==i)& (GameX['ThreePt']==1)
& ( GameX['shooter_name'] == PlayerNames[Name])
& ( GameX['made/missed'] == 1)]
NumShotsRecreate3Actual.append(len(LocationRecreate3Actual['made/missed']))
ExpectedPoints3RecreateActual.append(NumShotsRecreate3Actual[i]*3)
for i in range(0,6):
LocationRecreate2Actual = GameX[(GameX['LocationCluster']==i)& (GameX['ThreePt']==0)
& ( GameX['shooter_name'] == PlayerNames[Name])
& ( GameX['made/missed'] == 1)]
NumShotsRecreate2Actual.append(len(LocationRecreate2Actual['made/missed']))
ExpectedPoints2RecreateActual.append(NumShotsRecreate2Actual[i]*2)
Player3ptsActual = np.sum(ExpectedPoints3RecreateActual)
Player2ptsActual = np.sum(ExpectedPoints2RecreateActual)
PlayerToalPtsActual = Player3ptsActual + Player2ptsActual
TotalPtsActual.append(int(PlayerToalPtsActual))
print('---------------------------------------------------------------')
print(PlayerNames[Name])
print('Predicted: ' + str(int(PlayerToalPts)))
print('Actual: ' + str(PlayerToalPtsActual))
TotalTotalPts = np.sum(TotalPts)
TotalTotalPtsActual = np.sum(TotalPtsActual)
print('---------------------------------------------------------------')
print('---------------------------------------------------------------')
print('Final Jazz Score:')
print('Predicted: ' + str(TotalTotalPts))
print('Actual: ' + str(TotalTotalPtsActual))
---------------------------------------------------------------
Alec Burks
Predicted: 9
Actual: 16
---------------------------------------------------------------
Derrick Favors
Predicted: 14
Actual: 14
---------------------------------------------------------------
Donovan Mitchell
Predicted: 12
Actual: 6
---------------------------------------------------------------
Ekpe Udoh
Predicted: 0
Actual: 0
---------------------------------------------------------------
Joe Ingles
Predicted: 7
Actual: 11
---------------------------------------------------------------
Joe Johnson
Predicted: 10
Actual: 10
---------------------------------------------------------------
Ricky Rubio
Predicted: 8
Actual: 7
---------------------------------------------------------------
Rodney Hood
Predicted: 4
Actual: 6
---------------------------------------------------------------
Rudy Gobert
Predicted: 11
Actual: 14
---------------------------------------------------------------
Thabo Sefolosha
Predicted: 7
Actual: 6
---------------------------------------------------------------
---------------------------------------------------------------
Final Jazz Score:
Predicted: 82
Actual: 90
Conclusion:
We successfully were able to:
- Acquire the data needed from ESPN
- Clean the data to extract the needed values for analysis
- Cluster the shots into natural groupings
- Look at expected values for both the team and individuals
- Run significance testing on the data
- Compare home and away games
- Predict a Jazz game outcome using a create model
Ideas for future study:
-
Effects on fatigue and overshooting in locations
-
Correlation between shot selection and winning
-
Compare losing streak with winning streak
-
Predict fouls and foul shots
-
Evolution of Donovan Mitchell over the season