Premier League Performance Metrics and Results: A Dynamic Analysis

INFO 523 - Spring 2023 - Project 1

Tejas Bhawari, Gabriel Geffen, Ayesha Khatun, Alyssa Nether, Akash Srinivasan

Ever wondered how soccer game events, from shots to fouls, influence the score?

Question 1

What is the connection between in-game metrics such as shots on goal, fouls committed, and cards received, and the outcomes of soccer matches?

Approach

We choose relevant attributes such as shots, shots on goal, fouls committed, and cards both the home and away teams have earned. To guarantee the robustness of the model, the dataset is divided into training and testing sets. The logistic regression model is then trained, and its effectiveness is assessed using an accuracy score and a confusion matrix.

Exploring the complex connections between in-game metrics

  • FTHG and FTAG are crucial indicators
  • FTR serves as the target variable
  • HS, AS, HST,and AST reflect attacking performance.
  • HF, AF, HY, AY, HR,and AR all indicate team discipline and aggresion
  • Combining match information with team-level season summaries
  • The consistency of the dataset ensures reliability of our analysis
# Load the dataset
df = pd.read_csv('data/soccer21-22.csv')

# Creating a binary target variable 'Result' 
# 1 represents a home win ('H') and 0 otherwise
df['Result'] = (df['FTR'] == 'H').astype(int)

# Feature selection
features = ['HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HY', 'AY', 'HR', 'AR']
X = df[features]
y = df['Result']
  • Splitting data into appropriate sets then applying regression techniques
  • Regression techniques, confusion matrix, and feature correlation heatmap
  • Discovering patterns and relationships critical for building predictive models

Feature Importance Visualization

Confusion Matrix

Feature Correlation Heatmap

Question 2

What if the matches ended at halftime?

  • FTHG and FTAG are needed to determine team placements
  • FTR denotes the final actual outcome
  • HTHG, HTAG,and HTR are important as well
  • HomeTeam and AwayTeam combined with the rest help determine the winner
  • Performing thorough checks and creating visualizations.
  • Making sure data has no missing values
  • Creating a function to determine results of matches at halftime and using this re-calibration as a new baseline of analysis.
  • Chronologically organized data

Code

# Import libraries
import pandas as pd

# Load the dataset
df = pd.read_csv('data/soccer21-22.csv')

# Function to determine the winner based on points
def calculate_points(row):
    if row['FTR'] == 'H':
        return 3
    elif row['FTR'] == 'D':
        return 1
    else:
        return 0

# Apply the function to calculate points for each match
df['HomePoints'] = df.apply(lambda row: calculate_points(row), axis = 1)
df['AwayPoints'] = df.apply(lambda row: 3 - calculate_points(row) if row['FTR'] != 'D' else 1, axis = 1)

# Aggregate points for each team
home_points = df.groupby('HomeTeam')['HomePoints'].sum().reset_index()
away_points = df.groupby('AwayTeam')['AwayPoints'].sum().reset_index()

# Combine home and away points
team_points = pd.merge(home_points, away_points, how = 'outer', left_on = 'HomeTeam', right_on = 'AwayTeam')
team_points['TotalPoints'] = team_points['HomePoints'] + team_points['AwayPoints']

# Sort team_points DataFrame based on TotalPoints
team_points = team_points.sort_values(by = 'TotalPoints', ascending = False)

# Create ranking DataFrame
ft_ranking = pd.DataFrame({
    'Team': team_points['HomeTeam'],  # You can choose 'HomeTeam' or 'AwayTeam' because they are the same after merging
    'Points': team_points['TotalPoints'],
    'Ranking': range(1, len(team_points) + 1)
})

Plots

References

[1] Title: English Premier League Dataset, Source: tidytuesday, Link: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-04/soccer21-22.csv

[2] Quarto, For documentation and presentation - Quarto

Discussion

Overall, the insights derived from this analysis can offer valuable guidance for teams undergoing significant shifts in rankings. By identifying potential areas for improvement, such as halftime strategies, conditioning, or tactical adjustments, teams can make informed decisions to enhance their performance and competitiveness in professional football leagues.