Friday 29 February 2008

Currently, I share about a fifth of my music taste with the entire UK. Good to know.

There's something oddly pleasing about a huge amount of nicely-sorted data. This obviously doesn't hold if you have to sort through the data manually (I have very bad memories of the weeks I spent on Excel analysing my research project this time last year), but being able to pull up complex-looking summaries from a vast body of data is basically the geek equivalent of performing a perfect violin concerto.

The people behind Audioscrobbler - the database on which Last.fm runs - understand this principle extremely well, and have incorporated some extremely easy ways of grabbing all sorts of statistics out of their database. For starters, they have a huge page of automatically-updating XML files, which you can download and use pretty much however you like.

Just to see how easy it was to do this, I spent an hour or so this afternoon hacking together a little Python script that looks at any Last.fm user's most-listened-to artists, then compares that list to the most-listened-to artists in an entire country. The result is that you get a rough-and-ready comparison of your music taste as compared to, say, the entire United States. If you have a Last.fm account and want to have a go, I've hosted it on the (somewhat rickety but still pretty good) Utility Mill site. Enjoy yourselves - and if you're curious, my username's "zsige". (Utility Mill restricts scripts to just 2 seconds of run time, which is often not really enough - try again if it doesn't work for you the first time.)

Given that I'm by no means an excellent programmer - and that therefore someone who was could do some considerably more exciting things with this technology - Audioscrobbler is clearly onto a winner. Sadly, their generosity doesn't seem to be typical. A quick Google search only turns up a couple of similar services (notably the BBC, which doesn't surprise me in the least), and most of the people who are putting their databases online are charging for it. That's understandable, but seeing the potential that Audioscrobbler has - just by limiting users to, on average, one hit of the database every second - I wish that we saw more people doing this.

Now, if you'll excuse me, I'm going to see what I can do with the BBC data...

No comments: