Going to Strata in Feb?

Are you planning to attend Strata in Santa Clara at the end of February? Reach out for a discount registration code.


Conference: PyCon 2011 Keynote!

I gave the opening keynote this morning at PyCon.

The one thing that everyone in the room at PyCon has in common is that we all love to code. I used that as the central theme of the talk, spoke about the constructs that give us joy, the history of some of our favorite patterns (they date as far back as the 60s!) and proposed that we think about the way we’ll compute fifty years into the future. There’s also a bit of fun data hacking, of course.

Enjoy the slides. The video is up!

Please let me know here or on Twitter if you have any questions or comments.


Machine Learning: A Love Story

The video from my keynote at Strange Loop 2010 is up!

You can watch the video here: Machine Learning: A Love Story

The original abstract:

Machine learning has come a long way in recent years — from a long-marginalized field so old it still has the word “machine” in the name, to the last, best hope for making sense of our massive flows of data.

The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible. This talk will focus more on questions than on answers. I’ll give a brief history of the field with a focus on the fundamental math and algorithmic tools that we use to address these kinds of problems, then walk through several descriptive and predictive scenarios.

Finally, I’ll show one example system using bit.ly data in-depth, from the backend infrastructure through the algorithms and data processing layer to show a functioning product.

Attendees should expect to hear some good stories of data gone right and data gone awry, and walk away with a few new clever tricks.

The presentation was calibrated for the audience in the room, but I’ll be happy to answer any questions in the comments below!


Should you attend Hadoop World? Yes.

I received this e-mail via my contact form:

I just discovered you via a Google search because I’m highly considering attending this year’s upcoming Hadoop World in NYC. I appreciate your page that you wrote up after attending last year’s event. I’m wondering if you feel that Hadoop has enough momentum and support to be a “here to stay” technology worth investing one’s time and education into, or is it possible it might fade and be deprecated by something else as the need for big data analysis continues to grow? …

I’ve had a few similar conversation with people lately, and I thought posting my response might help others making similar decisions. The e-mail is referencing my post from last year’s hadoop world NYC.

Thanks for reaching out. There are several questions in your message
and I hope that this will address them all.

This IS an extremely exciting time to be alive and working with data.
We now have the capacity to learn thing about our systems, people in
general, and the world that we simply couldn’t know before — the
field is only going to keep growing.

Hadoop is currently the primary tool framework for this kind of data
analysis. It’s certainly worth learning now, especially since Amazon’s
elastic mapreduce makes it very easy for individuals and small team to
get started without a large investment.

I’m also not a huge fan of Java and I wish more resources were going
into non-Java alternatives. Fortunately, you can use hadoop via the
streaming API in most any language you choose.

I do think it’s important to separate the discussion of tools from the
larger philosophical discussion of open problems, algorithms, and
techniques. You can learn the tools from books and blogs. The real
reason to go to a conference like Hadoop World is to meet the people
who go to conferences like Hadoop World and get into those deeper
conversations.

I do hope this year’s conference will highlight the difference between tools and methods, and will also provide plenty of space for those casual hallways conversations.