Machine Learning: A Love Story

The video from my keynote at Strange Loop 2010 is up!

You can watch the video here: Machine Learning: A Love Story

The original abstract:

Machine learning has come a long way in recent years — from a long-marginalized field so old it still has the word “machine” in the name, to the last, best hope for making sense of our massive flows of data.

The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible. This talk will focus more on questions than on answers. I’ll give a brief history of the field with a focus on the fundamental math and algorithmic tools that we use to address these kinds of problems, then walk through several descriptive and predictive scenarios.

Finally, I’ll show one example system using bit.ly data in-depth, from the backend infrastructure through the algorithms and data processing layer to show a functioning product.

Attendees should expect to hear some good stories of data gone right and data gone awry, and walk away with a few new clever tricks.

The presentation was calibrated for the audience in the room, but I’ll be happy to answer any questions in the comments below!


  • dfd

    Please explain how “data science” differs from plain science.

  • http://hexhead.is-a-geek.net/ Bill White

    The same way computer science does.

  • http://cloudera.com Jeff

    Hey Hilary,

    Sounds like a great presentation! Any chance the slides will find their way to Slideshare?

    Thanks,
    Jeff

  • http://research.microsoft.com/~rherb Ralf Herbrich

    Very, very nice talk! Super interesting details about Twitter classification and “streamlining”; we are working on similar problems/set of (Bayesian) algorithms in FUSE Labs (http://fuse.microsoft.com). Would love to talk more with you about your ideas – please Email me if you are interested!

  • Amit C

    Slides are available on the same site, see the link on top, use
    http://www.bugmenot.com/view/infoq.com

  • Amit C

    About the slides for others, there are pretty sparse, video might be the way to go.

  • Mike

    This is so false: “The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible”.

    Unfortunate. Aggregations have blurred your innovation.

  • Jelena

    I like your presentation.Do you use the sistem of elimination?
    In medical world is that really good way for diagnosis.

  • David

    I caught only a piece of your NPR discussion. You said you collected information/data about people’s thinking on clothing styles. Can I get access to this information? Would this be something like focus groups?

  • http://anyall.org brendano

    When was machine learning a marginalized field?

    • http://shrewdbravado.com Joseph Abrahamson

      Whenever and wherever they still call it statistics!

      Actually, more seriously, it took some big historical hits for being rather too optimistic and then taking 20 years to come through on a fraction of what was promised.

      • http://anyall.org brendano

        i feel like machine learning is relevant these days in part *because* it embraced statistics.

        • http://shrewdbravado.com Joseph Abrahamson

          I was joking a bit since there’s a tendency for statistical techniques to become far more interesting and monetizable the moment you call it ML. In all honesty you’re absolutely right; the sorts of probability calculi that were developed in statistical fields are an enormous boon to ML generating tons of models, providing a flexible framework to create new ones within, and even giving powerful interpretations of models that were generated without considering the probabilistic interpretation.

          Any discriminator between the two fields is liable to misclassify heavily at the best of times and still be very non-robust.

  • http://technotales.wordpress.com/ Jonathan Palardy

    Could you put links to the books you mentioned?

    Is this the purple book you mentioned? http://www.amazon.com/dp/0321321367

  • Bob

    The camera followed you the whole time and didn’t show your slides. Where is the other half of your presentation?

    • http://www.hilarymason.com Hilary Mason

      You can download the PDF of the slides here: http://bit.ly/k37Zpc

      I’m happy to send the PPT if you prefer, just send me your e-mail address (the contact form on this site goes straight to my e-mail).

  • http://twitter.com/damned_liesss Suet Yi Lee

     Awesome #machinelearning stuff from @hmason:twitter that held my attention from beginning to end. 

  • Pingback: » An Introduction to Machine Learning with Web Data is now available! hilarymason.com

  • Pingback: What is Machine Learning – or How Does Google Translate Work? | Florian Hartl

  • gawbul

    I loved that you said that “Yes, probability and statistics should be taught. It should be taught early, it should be taught well and I would even argue that it should be taught instead of pre-calculus.” – I completely agree and in fact we were discussing this from the point of view of teaching biology and how probability, statistics and data analysis/bioinformatics should be integral to biology degrees from the start!

    • what_i_am_thinking_rightnow

      Totally agree with you on that. I took bio 101 and bio102, then business stats, needless to say, taking stats along with, or before biology, would have made research papers much more fun.