Premier League Performance Metrics and Results: A Dynamic Analysis
INFO 523 - Spring 2023 - Project 1
Tejas Bhawari, Gabriel Geffen, Ayesha Khatun, Alyssa Nether, Akash Srinivasan
Ever wondered how soccer game events, from shots to fouls, influence the score?
Question 1
What is the connection between in-game metrics such as shots on goal, fouls committed, and cards received, and the outcomes of soccer matches?
Approach
We choose relevant attributes such as shots, shots on goal, fouls committed, and cards both the home and away teams have earned. To guarantee the robustness of the model, the dataset is divided into training and testing sets. The logistic regression model is then trained, and its effectiveness is assessed using an accuracy score and a confusion matrix.
Exploring the complex connections between in-game metrics
FTHG and FTAG are needed to determine team placements
FTR denotes the final actual outcome
HTHG, HTAG,and HTR are important as well
HomeTeam and AwayTeam combined with the rest help determine the winner
Performing thorough checks and creating visualizations.
Making sure data has no missing values
Creating a function to determine results of matches at halftime and using this re-calibration as a new baseline of analysis.
Chronologically organized data
Code
# Import librariesimport pandas as pd# Load the datasetdf = pd.read_csv('data/soccer21-22.csv')# Function to determine the winner based on pointsdef calculate_points(row):if row['FTR'] =='H':return3elif row['FTR'] =='D':return1else:return0# Apply the function to calculate points for each matchdf['HomePoints'] = df.apply(lambda row: calculate_points(row), axis =1)df['AwayPoints'] = df.apply(lambda row: 3- calculate_points(row) if row['FTR'] !='D'else1, axis =1)# Aggregate points for each teamhome_points = df.groupby('HomeTeam')['HomePoints'].sum().reset_index()away_points = df.groupby('AwayTeam')['AwayPoints'].sum().reset_index()# Combine home and away pointsteam_points = pd.merge(home_points, away_points, how ='outer', left_on ='HomeTeam', right_on ='AwayTeam')team_points['TotalPoints'] = team_points['HomePoints'] + team_points['AwayPoints']# Sort team_points DataFrame based on TotalPointsteam_points = team_points.sort_values(by ='TotalPoints', ascending =False)# Create ranking DataFrameft_ranking = pd.DataFrame({'Team': team_points['HomeTeam'], # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging'Points': team_points['TotalPoints'],'Ranking': range(1, len(team_points) +1)})
[2] Quarto, For documentation and presentation - Quarto
Discussion
Overall, the insights derived from this analysis can offer valuable guidance for teams undergoing significant shifts in rankings. By identifying potential areas for improvement, such as halftime strategies, conditioning, or tactical adjustments, teams can make informed decisions to enhance their performance and competitiveness in professional football leagues.