Okay, so I was messing around with some data yesterday, trying to see if I could predict the outcome of the Leeds United vs Burnley game. Thought I’d share what I did. It’s not fancy, just my way of scratching an itch.

First thing I did was grab historical data. Found a decent dataset online with past match results, including goals scored, shots on target, fouls, all that jazz. I used Python with Pandas to load it up. Nothing groundbreaking, just *_csv()
.
Then came the data cleaning. You know how it is – messy data everywhere. Missing values, inconsistent formatting. I filled the missing bits with the mean for numerical columns, and “Unknown” for the others. Also made sure the team names were consistent – Leeds United instead of just Leeds, that kind of stuff. Super important, or the whole thing falls apart.
Next up, feature engineering. This is where it got a bit more interesting. I calculated things like “average goals scored per game” for each team, “win percentage at home/away,” and “head-to-head record.” Just adding these columns to my Pandas DataFrame. I figured these stats might give me some insight into the teams’ strengths and weaknesses.
After that, I built a simple model. I used scikit-learn and went with a Logistic Regression. I know, super basic, but I wanted to keep it simple. I split the data into training and testing sets, trained the model on the training data, and then tested its accuracy on the testing data. My features were the ones I engineered earlier, and the target variable was just whether Leeds won, lost, or drew.
So, after training, the model spat out its prediction for the Leeds United vs Burnley game. I gotta say, the accuracy wasn’t amazing – around 65%. But hey, it’s better than flipping a coin, right? The model predicted a narrow win for Leeds.

Finally, just for fun, I looked at feature importance. Logistic Regression lets you see which features had the biggest impact on the prediction. Turns out “head-to-head record” and “average goals scored at home” were pretty influential. Makes sense, I guess.
In conclusion, it was a fun little project. Did my model accurately predict the game? I’ll find out soon enough. But even if it’s totally wrong, I learned a bit, and that’s what matters, right? This whole thing was just a way to practice using Pandas and scikit-learn. I think these are the key takeaway.