Overview
In this project I set out to create a classification model in order to predict key characteristics where soccer players are expected to show significant market value growth in the short to mid-term future. I relied on data available on Transfermarkt.com, a European soccer player market valuation database/website used by soccer clubs around the world.
Data Collection and Cleaning
In order to collect the relevant data, I developed a scrapping program using Beautiful Soup on Python. After scraping all of the necessary data, I used python to remove any null values and standardized all data types. After collecting and cleaning the data through Python. I was finally ready to perform preliminary data exploration to find any key trends of themes in the dataset.
Exploratory Data Analysis
|
Size = Change in market value, X = age, Y = Market Value |
From an initial review of the dataset, I found that there was most likely a strong relationship between the age of the player and expected market value growth, with players around 18 years of age experiencing the most significant growth in market value on a nominal basis. In addition to age, the prior market value looks to have a significant impact on expected market value growth. The higher the player is valued, the more likely the athlete's market value will increase substantially as long as they are less than 28 years of age.
|
Side by side plots with linear regression |
I did notice that there may be a significant relationship between age, position and change in market value as my data exploration makes it clear that there should be different peaks in market value during a player's career dependent upon their position. For instance, from the figures below you can see that central midfielders tend to experience rapid increase in market value around 18, yet center backs experience a rapid increase in market value in their late 20's. Although a positions feature would most likely strengthen the findings of the classification model, with a numerical in-balance in positions and smaller dataset, I decided to omit including the positions factor in the model.
|
Change in market value by age for central midfielders |
|
Change in market value by age for center-backs |
Generating and Selecting a Classification Model
After getting a good idea in terms of how the data may be correlated, I was ready to get started on the classification model. In my model, I decided to use an increase of 10 million euros since January 1, 2021 as the cut off for a player being deemed as "profitable". After splitting my data into training and testing sets, I was ready to test out a few models to see which one would yield the most accurate predictions.
For this dataset, I decided to dig deeper into the KNN model and ended up with 10 nearest neighbors.
|
KNN model with K = 10, X = age, Y = prior market value. Red signifying profitable players. |
Results
From a quick data exploration and validation from the KNN model, we can see that there may be a relationship between age and prior transfer value and short/mid-term market value returns. As the KNN model above shows, that younger players with higher initial market values are more likely to be categorized as "profitable" with a market value increases of over 10 million Euros.
As this is just a preliminary analysis I plan on improving the model by expanding my dataset and by increasing the number of factors. In order to expand my dataset, I would need to find a new source of data, as the data source I used does not include players with significant market value decreases- this would most definitely make the model more accurate as it would take into consideration players who are overvalued and may experience a market value decrease.
In addition to expanding the observations, there are clearly other factors that may impact short/mid-term market value changes for soccer players. In the near future I plan on combining the data available from KPMG's Football benchmark to included game statistics like expected goals/assists per 90.
Comments
Post a Comment