Alright, let’s dive into my little adventure with “clasificacion general tour colombia 2024.” It all started with me just being curious about the race, you know? I’m not a hardcore cycling fan, but the Tour Colombia always pops up on my radar, and I figured, why not try and get my hands dirty with the data this year?

First off, I needed the data. I spent a good chunk of time scouring the web. I started by simply Googling “Tour Colombia 2024 general classification.” Plenty of sports news sites popped up, but I wanted something a bit more…structured. I was hoping for a nice, clean table I could just copy and paste, but no such luck.
So, I landed on a couple of cycling-specific websites that usually have this kind of info. I found one with the full standings, but it was a pain to copy directly. The website’s formatting was all over the place, and pasting it into a spreadsheet turned into a total mess. Think random line breaks, weird characters, the whole shebang.
Okay, plan B. I decided to get a little more hands-on. I started manually copying the names, times, and positions into a Google Sheet. Tedious? Absolutely. But it gave me the cleanest data to start with. I made sure to double-check everything as I went, because one wrong number and the whole thing gets thrown off.
Once I had the basic data in my spreadsheet, it was time to make it a bit more useful. I added some extra columns to calculate time differences, maybe look at averages if I felt ambitious. I also wanted to make sure everything was sortable, so I could easily see who was leading, who had the biggest gains, etc.
Now, here’s where things got a little more interesting. I wanted to visualise the data. Simple bar charts showing the top riders and their times were a good start. I played around with different chart types in Google Sheets, just to see what looked best. Nothing fancy, but it helped me get a better sense of the overall race dynamics.

Finally, I exported the data as a CSV and messed around in Python. I used pandas to load the CSV, then used matplotlib and seaborn to make fancier plots. I made a leaderboard and other ways to represent the data. It was a pretty cool feeling to go from a messy website to interactive graphs.
Lessons learned? Data cleaning is always the most time-consuming part, no matter how simple the project seems. And sometimes, the manual approach is still the best way to get reliable data. Plus, even a little bit of data visualisation can make a huge difference in understanding what’s going on. It was a fun little project, and I might do it again next year!