Alright, let me walk you through how I tackled the men’s gymnastics qualifications data. It was a bit of a journey, lemme tell ya.

First off, I grabbed the data. It was scattered all over the official Olympics website, like they were playing hide-and-seek with it. I’m talking PDFs, tables on webpages… a real mess. So, I started by manually copying some of the simpler tables into a spreadsheet. Tedious? You bet. But sometimes, you gotta do what you gotta do.
Then I thought, “There’s gotta be a better way!” I looked into web scraping. Never really done it before, but figured, why not? I played around with Python and Beautiful Soup. Took a bit of trial and error, a lot of Googling, and a fair amount of swearing, but I finally managed to scrape the data from some of the more consistently formatted pages. Score!
Next up: dealing with the PDFs. Oh, the PDFs. Each one was formatted slightly differently, making it a proper pain. I tried a few different Python libraries for PDF parsing – PDFMiner, PyPDF2… the usual suspects. They all had their quirks. Some handled certain PDFs better than others. So, I ended up using a combination of them, depending on the specific PDF. It was a bit of a Frankenstein approach, but hey, it worked!
Once I had all the data scraped and parsed, it was time to clean it up. This was probably the most time-consuming part. I’m talking inconsistent naming conventions, missing values, weird formatting issues… you name it. I used Pandas in Python to wrangle the data, clean it up, and get it into a usable format. Think of it like scrubbing a dirty floor – not glamorous, but essential.
Then, I started looking for patterns. Who were the top performers on each apparatus? How did different countries compare? I made some visualizations using Matplotlib and Seaborn – nothing fancy, just some basic bar charts and scatter plots to get a sense of the data. It’s always cool when you start to see the story the data is trying to tell you.

After that I put everything into a single file to share.
Finally, I presented my findings. It wasn’t perfect, but it was a solid start. And more importantly, I learned a ton along the way. Web scraping, data cleaning, visualization… it was a crash course in data analysis. And now, I’m ready to tackle the next challenge!