Okay, so, yesterday I was messing around with some rugby data – specifically, trying to see if I could pull out some insights about the Wales and Scotland teams. Thought it’d be a fun little project, right?

First off, I started by scraping data. I know, I know, everyone says scraping is a pain, but I found a decent site with game stats from recent matches. Used Python with Beautiful Soup – pretty standard stuff. I spent a good hour just cleaning the data after pulling it. Dates were formatted all weird, team names had typos, and the scoring data was a mess of abbreviations. Honestly, cleaning the data was half the battle. Wish I had a cleaner dataset to begin with!
Next up, I wanted to get a feel for how these teams usually perform. I figured looking at the average points scored per game, tries, conversions, penalties, etc., would be a good start. Used Pandas to crunch those numbers. Turns out Wales tends to score slightly more penalties than Scotland, but Scotland usually gets more tries. Interesting!
Then, I dug a little deeper. I thought, “Okay, what about home vs. away games?” Turns out, both teams perform better at home (duh, right?). But the difference was more significant for Scotland than for Wales. Maybe Murrayfield just has some kind of magic or something?
After that, I wanted to look at win percentages. Calculated those for the last, say, 20 matches between the two teams. It’s pretty close, but Wales has a slight edge overall. That was good to know.
I even tried messing around with some simple visualizations. Made a few bar charts using Matplotlib to compare the different stats I’d calculated. Nothing fancy, but it helped me see the data more clearly.

Challenges I ran into:
- Data consistency was a big one. Different sites format their data differently, so if I wanted to pull from multiple sources, I’d have to standardize everything.
- Dealing with missing data. Some games were missing certain stats, so I had to decide how to handle those – whether to impute them or just ignore them.
What I learned:
- Cleaning data is super important and takes way longer than you think.
- Even simple stats can give you some interesting insights.
- Rugby stats can be pretty complex! There’s a lot more to it than just points scored.
Overall, it was a fun little project. I definitely want to explore this more. Next time, I might try to incorporate some more advanced statistical analysis, or even build a simple model to predict match outcomes. Who knows? Maybe I’ll become a rugby stats guru one day!