a developer's blog


Introduction to Clustering @ Machine Learning meetup

I gave a introduction to clustering at the Machine Learning meetup-group here in Gothenburg with 35 Machine Learning-enthusiasts, in a wide range of ML-experience, participating.

We, Spotify, hosted the meetup at our Gothenburg office and I gave two presentations on roughly 30 minutes each. The first one was a theoretical introduction to clustering and feature engineering. The purpose of the presentation was to first build an intuition of what clustering is (and what it isn't) by pedagogically explaining the steps from getting the data to the output of the algorithm.

And, as I'm really excited about the Machine Learning-community in Python I demoed some data mining in Python. Showing of the use-everyday-libraries such as Numpy, Pandas and Scikit-learn and tie it together with the theoretical methods I had already explained. The clustering examples I did were on Music Lyric data scraped from a lyrics Wikipedia. I showed how we easily could pre-process (clean / vectorize) the data and then plug it into our clustering algorithms. As problem of clustering is unsupervised hence hard to evaluate I (think) managed to show a typical work flow when doing Machine Learning on real-world data. There were tons of obvious preprocessing steps I skipped, but I did so to not add to much to the presentation but to make it easy to follow.

And I must say that I enjoyed talking to all of you attending the event both in the break and afterwards. I noticed that most participated to meet and discuss Machine Learning with others and I'm glad we got the chance to do so.

Material I references to in the talk:

Latest on our Rec Sys : http://www.a1k0n.net/spotify/ml-madison/

Deep Learning at Spotify: http://benanne.github.io/2014/08/05/spotify-cnns.html

My presentation:

Presentation 1: Theoretical introduction to Clustering and feature engineering

Presentation/demo 2: Introduction to data (mining) analysis in Python

Note that the last part, after mentioning iPython, was demoed live in the notebook. Get the notebook at Github repot and try it yourself.