DataGotham 2013 is coming!

Registration is open for DataGotham 2013, our second annual New York data community conference, September 12th and 13th. The core of the conference is a series of brilliant data practitioners telling the stories about what they work on. The content is technically-oriented but not all deeply technical, and we really welcome anyone curious about how New York companies and institutions are pushing the boundaries on data to attend.

We have two goals for the conference. The primary goal is to connect people in the greater New York data community who are working on interesting things. If our community is strong and supportive, we will all do better work.

Our second goal is to highlight the amazing working happening here, so that people near and far will realize that New York is the best place in the world to do data science.

Come join us to hear these stories firsthand and meet fellow data-minded practitioners! Register now:

Eventbrite - DataGotham 2013

(Readers of this blog can use discount code “IheartNYC” for 10% off, and I hope to see you there!)

Speaking: Spend at least 1/3 of the time practicing the talk

This week we welcome a guest contribution. Matthew Trentacoste is a recovering academic and a computer scientist at Adobe, where he writes software to make pretty pictures. He’s constantly curious, often about data, and cooks a lot. You can follow his exploits at @mattttrent.

In Hilary’s last post, she made the point that your slides != your talk. In a well-crafted talk, your message — in the form of the words you say — needs to dominate while the slides need to play a supporting role. Speak the important parts, and use your slides as a backdrop for what you’re saying.

Hilary has provided a valuable strategy in her post, but how should someone approach crafting such a clearly-organized presentation? If you’re just getting started speaking, it can be a real challenge to make a coherent talk and along with slides that add to it rather than distract. This post provides tactics to help you make the most of that advice.

And that advice is… well, more than just advice. Separating the message of your words and your slides is nearly requisite for an engaging talk, and more importantly, absolutely necessary for effective communication. Your listeners are unable to read words and hear words simultaneously. If you put loads of writing on your sildes and speak the whole time, you can guarantee that they’ll miss one or the other. To get an idea of what I mean, try listening to a TV show and reading a book at the same time and observe just how much you remember of either.

I’m not saying that you should never put words on your slides (far from it). You should, however, need to think about how your slides and spoken words relate to each other. Your audience can either listen to you and ignore your slides, or they can read your slides and ignore you. Worse yet, when presented with two simultaneous streams of information, your audience will probably do a half-hearted job of paying attention to both talk and slides, missing much of your point.

Novice speakers often fail to recognize the difference between the talk and their slides. Hopefully this post and the last will have convinced you that they complementary aspects, not one and the same. Even when a speaker recognizes the difference, they often spend too much time on their slides and not enough time on their talk. Behavioral psychology tells us that when faced with an uncomfortable situtaion or task, we usually default to the most familiar aspect of it. If you’re a nerd, chances are your most familiar thing is sitting at your computer, working on the slides.

In reality, “working on the slides” usually translates to messing around with better fonts, better equations, better plots, better cat photos, etc… instead of figuring out whether or not you’re clearly and effectively communicating your point. Also, if you’re new to giving talks, you’ll tend to cram more words onto the slides to assure yourself you won’t forget them while speaking. (Tip: put all your extra reminders in the slide notes, and give the talk with your own laptop to make sure you have presenter view, it’s a life-saver)

To combat this common tendency towards slide-wank, I have one simple rule:

If your talk is N minutes long, you are allowed to work on it for at most 2N minutes before you must practice giving the entire talk again.

In other words, fully one third of the time you spend working on your talk must be spent practicing it. Out loud. From start to finish. For reals.

Let’s say you’re giving a 20 minute talk. This rule means you have 40 minutes, max, that you can work on your slides before you have to stand and present it. This counter starts the minute you open powerpoint/keynote/deck.js. You’ve got 40 minutes to frantically type titles and copy/paste cat photos into your blank slide deck, then you have to stand up and give it.

But, but…

Yes, that’s 40 minutes to go from blank slides to giving a 20 minute talk to your dog.
Yes, you will be completely unprepared.
Yes, you will forget why at least one slide is there (despite having made it less than an hour before) — then remember 5 slides later, then ruin your train of thought by going back to explain it.
Yes, it is going to be brutally uncomfortable.
Yes, that’s the point.

The point, however, isn’t to be sadistic just for the sake of it. In the end, you won’t be judged on how flashy your slides are, but by how effectively you’ve told your story. If you are unused to speaking, getting up and giving a talk can be really uncomfortable. Even if you are used to it, trying to iron out talking points can still be really frustrating. It’s so much more comfortable to fiddle with fonts and images, so we need hard rules to keep us on the level.

There are two big benefits to this approach. First, it forces you to focus on the speaking. Do your slides (and the ideas that they contain) flow coherently from one to the next? Does one part drag on while another feels rushed? Does that one section even make any sense? When rehearsing your talk, you’ll rapidly identify troublesome portions that you would have never noticed if focused purely on how the slides look. On more than one occasion, I’ve had to delete super-fancy-looking slides because I just couldn’t weave them into my narrative.

Second, this approach forces you to get really good at messing up. It goes without saying that more you practice anything, the better you’ll be at it. Practicing your talk will expose places where you’re likely to mess up, so that you can focus more effort on making it through those sections. Most importantly, all of your practice giving the talk in a highly-unpolished state will teach you to push through any accidental misspeakings. By repeatedly practicing your speaking parts, you’ll have said every part of the talk in a dozen slightly different ways. You will become comfortable with flowing into and out of every point using slightly different words. When you accidentally say something unintended on stage, it won’t be as nearly as stressful. You might not be able to smoothly deliver the entire talk, but you’ll be far more confindent and able to recover from messups much more gracefully than you otherwise would.

The end goal is to stand in front of a room of people and say something that they will find interesting. Your words, your slides, and time spent rehearsing work in service of this goal. We all like to fiddle with stuff on our computers, and it can be hard to keep the narrative in the spotlight when deciding between 16pt and 17pt fonts. Following hard rules when preparing a talk ensures you focus your energy on the scariest part: the speaking.

This article is part of my series of speaking hacks for introverts and nerds. Read about the motivation here. And if you have a hack you want to share, let me know!

Speaking: Your Slides != Your Talk

Slides are the supporting structure for your talk, not the main event. Speak the meaty and informative portion of the presentation out loud and use slides as a backdrop to set either the emotional tone or reinforce the message that you are trying to convey.

Obama and Social People

For example, I love using this image of Obama in Berlin as a backdrop when I talk about the growth of social data over the last several years. In this image every single person has a device and is generating their own data about their shared social experience. The content of the image supports what is otherwise a fairly abstract statement, and you can feel the excitement of the crowd, boosting the excitement that I want to share about the possibilities of social data.

This is a particular style of slide design will fail for situations where “the Powerpoint” will be shared independently of the talk, and it’s not appropriate for all content, but it is a ton of fun when you can get away with it and uses people’s expectations about what they are going to see (a speaker and some slides) to create a more compelling experience.

This article is part of my series of speaking hacks for introverts and nerds. Read about the motivation here.

Lucene Revolution Keynote: Search is Not a Solved Problem

The wonderful folks at LucidWorks have posted the video of my recent Lucene Revolution keynote.

The brief idea behind this talk is that search is not a solved problem — there is still a big opportunity for building search (and finding?) capabilities for the kinds of questions that the current product fail to solve. For example, why do search engines just return a list of sorted URLs, but give me no information about the themes that are consistent across them?

The audience was technical, specifically Lucene and Solr devs, so I spent some time talking about how we use those technologies at bitly.

Speaking: Explaining Technical Information to a Mixed Audience

It’s a challenge to present deeply technical material to a room of people with varying expertise levels. If you leave it out, you’re abandoning the substance of your presentation. If you focus on it exclusively, you will lose most of the room.

Instead, include the material, but plan to repeat it two (or even three!) times.

The first time you explain it, explain it for the expert audience.

The second time you explain it, walk through an example of what the system enables.

If you’re audience is on Twitter, throw in a third version — the concise and tweetable one!

Let’s say we were giving a talk about a machine learning system to classify puppies.

Slide one would have a technical diagram of the architecture of the system, and you might explain it as: “We use a naive bayesian classifier over two hundred features to discriminate between puppies and non-puppies in our data set. As you see, our system is 85% accurate and each analysis takes 300 milliseconds. We implemented the classifier in Python, using the scikits-learn toolkit…” Don’t skimp on the details, but don’t use more than one slide for this part if possible and the explanation shouldn’t take more than a few minutes.

Slide two would have images of puppies and non-puppies, and might be explained, “This means that we have an algorithm that can distinguish between the puppies you see on top and the other objects quite accurately and quickly using features like ear floppiness and nose size.”

Slide three would be the cutest puppy you can find, and you might say, “Yes, we’ve created the worlds fastest cuteness identifying machine!” Only include the third version if the audience is online anyway. They’re probably only paying half-attention to you as you speak and this gives them something concise to share and take away from your talk.

The technique of repeating the information at varying levels of intensity has the side effect of walking people through to understanding. They may still be puzzling through the technical material when you explain it non-technically, and this seems to help the meaning snap into focus.

Break up your technical material with layered explanations and you’ll keep the audience entertained while maximizing the amount of information that each person takes away. Win.

Et tu, Google?

In 2008, cuil, a search engine startup, displayed my bio alongside a photo of deceased actress Hilary Mason. In January 2013, Bing confused us, this time putting my photo next to her bio (they fixed it after a suitable amount of mocking on Twitter).

Today, Google did the same thing. (live search link)

Today I win the internet?

Screen Shot 2013-04-14 at 4.59.24 PM

If you zoom in on the bio section, you can clearly see that it’s her bio with a photo of me (originally from Crain’s New York 40 under Forty). Further, if you go into her filmography, you continue to see my photo.

I’m most proud of my starring role in the amazing film Robot Jox. (bottom right of the image below)


I know that entity disambiguation is a hard problem. I’ve worked on it, though never with the kind of resources that I imagine Google can bring to it. And yet, this is absurd!

Note: It’s also been pointed out to me that there’s a slim possibility that Google’s confusion stems from my own post about Bing’s error, in which case, this post will certainly make the confusion worse. To that I say — bring it on, technofuture irony!


Speaking: 1 Kitten per Equation


Use a ratio of one cute cat photo per equation in your talk.

This is a concise way of saying that a ratio of one part heavy, technical content to one part light-hearted explanation is ideal.

You may have to play with the ratio depending on the audience or the expectations, but people react best when they have the chance to learn something fundamentally hard and interesting while, at the same time, getting to smile.

And yes, DO use photos of cute things in your talks! The hack here is that people naturally smile when they look at adorableness. If they are smiling in your talk they credit you for the positive feelings. It’s an easy way to boost people’s perceived enjoyment of your talk and to get your audience into the kind of mood where it’s easier to walk them through more complex, technical material.

Data Engineering

Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system.

It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it.

Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data, before you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when they ignore this research requirement.

Forget Table is one example of a data engineering project from our work at bitly. It’s a database for storing non-stationary categorical distributions. We often see streams of data and want to understand what the distributions in that data look like, knowing that they drift over time. Forget Table is  designed precisely for this use, allowing you to configure the rate of change in your particular dataset (check it out on github).

Speaking: Use the Narrative Arc

If you took a college freshman literature class, you probably remember a diagram like this:


…with the x-axis reprenting time, and the y-axis (which, for some infuriating reason, is never labeled) representing intensity.

Last week’s speaking hack was to limit yourself to 15 minutes (or less!) per idea. The hack this week is to use this gradient of intensity within each segment you present.

If you wrote it out as a linear outline, each idea in your talk might have:

  1. an introduction to the idea
  2. a high-level overview of the idea
  3. the technical details
  4. an example that brings the technical details together (this is the most exciting part!)
  5. a conclusion that wraps up why this is exciting, how it works, and what people learned

You can also use the narrative arc to structure the intensity of the talk as a whole. By ordering the ideas you explore by intensity and having a strong introduction and strong conclusion, you can keep people engaged throughout the entire presentation.

This article is part of my series of speaking hacks for introverts and nerds. Read about the motivation here.

Why Google Now is Awesome

google-now-cardsGoogle Now is an extension to Google’s Android search app that uses all of the data that Google has about you along with what it can guess about your current context to present the information it thinks you need when it thinks you need it.

It’ll tell you to leave a bit early to make your next calendar event because of heavy traffic, or that it’s a friend’s birthday, or that there’s a cool cafe nearby where you are.

I think it’s amazing.

It’s amazing because this is the first Google product that takes ALL OF THE DATA that they have about us and actually makes it useful for us. Not for advertisers.