Machine Learning: A Love Story

The video from my keynote at Strange Loop 2010 is up!

You can watch the video here: Machine Learning: A Love Story

The original abstract:

Machine learning has come a long way in recent years — from a long-marginalized field so old it still has the word “machine” in the name, to the last, best hope for making sense of our massive flows of data.

The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible. This talk will focus more on questions than on answers. I’ll give a brief history of the field with a focus on the fundamental math and algorithmic tools that we use to address these kinds of problems, then walk through several descriptive and predictive scenarios.

Finally, I’ll show one example system using data in-depth, from the backend infrastructure through the algorithms and data processing layer to show a functioning product.

Attendees should expect to hear some good stories of data gone right and data gone awry, and walk away with a few new clever tricks.

The presentation was calibrated for the audience in the room, but I’ll be happy to answer any questions in the comments below!

27 Comments on “Machine Learning: A Love Story”

  1. dfd says:

    Please explain how “data science” differs from plain science.

  2. Bill White says:

    The same way computer science does.

  3. Jeff says:

    Hey Hilary,

    Sounds like a great presentation! Any chance the slides will find their way to Slideshare?


  4. Very, very nice talk! Super interesting details about Twitter classification and “streamlining”; we are working on similar problems/set of (Bayesian) algorithms in FUSE Labs ( Would love to talk more with you about your ideas – please Email me if you are interested!

  5. Amit C says:

    Slides are available on the same site, see the link on top, use

  6. Amit C says:

    About the slides for others, there are pretty sparse, video might be the way to go.

  7. Mike says:

    This is so false: “The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible”.

    Unfortunate. Aggregations have blurred your innovation.

  8. Jelena says:

    I like your presentation.Do you use the sistem of elimination?
    In medical world is that really good way for diagnosis.

  9. David says:

    I caught only a piece of your NPR discussion. You said you collected information/data about people’s thinking on clothing styles. Can I get access to this information? Would this be something like focus groups?

  10. brendano says:

    When was machine learning a marginalized field?

    • Whenever and wherever they still call it statistics!

      Actually, more seriously, it took some big historical hits for being rather too optimistic and then taking 20 years to come through on a fraction of what was promised.

      • brendano says:

        i feel like machine learning is relevant these days in part *because* it embraced statistics.

        • I was joking a bit since there’s a tendency for statistical techniques to become far more interesting and monetizable the moment you call it ML. In all honesty you’re absolutely right; the sorts of probability calculi that were developed in statistical fields are an enormous boon to ML generating tons of models, providing a flexible framework to create new ones within, and even giving powerful interpretations of models that were generated without considering the probabilistic interpretation.

          Any discriminator between the two fields is liable to misclassify heavily at the best of times and still be very non-robust.

  11. Could you put links to the books you mentioned?

    Is this the purple book you mentioned?

  12. Bob says:

    The camera followed you the whole time and didn’t show your slides. Where is the other half of your presentation?

  13. Suet Yi Lee says:

     Awesome #machinelearning stuff from @hmason:twitter that held my attention from beginning to end. 

  14. […] video is an instructional take and builds on the material I covered in my Strange Loop 2010 keynote Machine Learning: A Love Story and the Data Bootcamp I did with Joe Adler, Drew Conway, and Jake Hofman at the Strata Conference […]

  15. […] in watching an entertaining and insightful presentation on Machine Learning, take a look at Machine Learning: A Love Story by Hilary […]

  16. gawbul says:

    I loved that you said that “Yes, probability and statistics should be taught. It should be taught early, it should be taught well and I would even argue that it should be taught instead of pre-calculus.” – I completely agree and in fact we were discussing this from the point of view of teaching biology and how probability, statistics and data analysis/bioinformatics should be integral to biology degrees from the start!

    • what_i_am_thinking_rightnow says:

      Totally agree with you on that. I took bio 101 and bio102, then business stats, needless to say, taking stats along with, or before biology, would have made research papers much more fun.