Interview Questions for Data Scientists

Great data scientists come from such diverse backgrounds that it can be difficult to get a sense of whether someone is up to the job in just a short interview. In addition to the technical questions, I find it useful to have a few questions that draw out the more creative and less discrete elements of a candidate’s personality. Here are a few of my favorite questions.

  1. What was the last thing that you made for fun?

    This is my favorite question by far — I want to work with the kind of people who don’t turn their brains off when they go home. It’s also a great way to learn what gets people excited.

  2. What’s your favorite algorithm? Can you explain it to me?

    I don’t know any data scientists who haven’t fallen in love with an algorithm, and I want to see both that enthusiasm and that the candidate can explain it to a knowledgable audience.

    Update: As Drew pointed out on Twitter, do be aware of hammer syndrome: when someone falls so in love with one algorithm that they try to apply it to everything, even when better choices are available.

  3. Tell me about a data project you’ve done that was successful. How did you add unique value?

    This is a chance for the candidate to walk us through a success and show off a bit. It’s also a great gateway into talking about their process and preferred tools and experience.

  4. Tell me about something that failed. What would you change if you had to do it over again?

    This is a tricky question, and sometimes it takes people a few tries to get to a complete answer. It’s worth asking, though, to see that people have the confidence to talk about something that went awry, and the wisdom to have recognized when something they did was not optimal.

  5. You clearly know a bit about our data and our work. When you look around, what’s the first thing that comes to mind as “why haven’t you done X”?!

    Technical competence is useless without the creativity to know where to focus it. I love when people come in with questions and ideas.

  6. What’s the best interview question anyone has ever asked you?

    I’d like to wish for more wishes, please.

I’m always looking for new and interesting things to add to my list, and I’d love to hear your suggestions.


Getting Started with Data Science

I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them:

The best way to get started in data science is to DO data science!

First, data scientists do three fundamentally different things: math, code (and engineer systems), and communicate. Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities.

Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind, that need data skills put to work for good. No matter how much of a beginner you might be, your enthusiasm will be appreciated, you’ll learn things, and you’ll meet great people. And if you can’t find a physical meetup close to you, start one, or join the twitter discussion.

Third, put your projects out in public. Share them on Github, your blog, and Twitter. Explain why you thought the question was interesting, where you got the data (and good data is everywhere), and how you came to a conclusion. It doesn’t have to be perfect. A couple examples of data projects motivated by nothing more than the author’s curiosity are  Yvo’s TechCrunch analysis and Drew and John’s Ranking the Popularity of Programming Languages.

Finally, you can start right here. What advice do you give? What great projects have you seen lately? Share them in the comments.


Help, I’m the first data scientist at my company!

I moderated a panel at DataGotham with Adam Laiacano from TumblrFred Benenson from Kickstarter, and Roberto Medri from Etsy about being the first data scientist at a company. We covered everything from what people’s job responsibilities are, the tools they use, successes, failures, how they are integrated into an organization, and how they have hired other data scientists to join them. The panelists were concise, articulate, and intelligent. Watch it below!


How do you prioritize research?

One of the most fun and challenging parts of my job is setting bitly’s research agenda. We’re a startup, so this means prioritizing the set of questions we look into in the context of what will be most beneficial for the rest of the business, for the short and long-term, by creating opportunity and opening up potential futures. We work on a wide variety of projects, from pure research to press collaborations to infrastructure and experimental products.

We always have a list of research questions way longer than we have time and resources to pursue, so we developed a process for evaluating whether a given question is worth pursuing at a particular time.

This is the kind of process that I’ve only discussed with several people over whisky (thanks!), but not seen written up. I initially had a much longer list of questions but have decided to keep it as simple as possible, to frame a discussion but not dictate or burden it. I hope it’s helpful and I would love to hear about other appproaches.

For each research question that we might look into, we ask the following:

  1. State the research question.
  2. How do we know when we’ve won?
  3. Assume we’ve solved this question perfectly. What are the first things that we’ll build with it?
  4. If everyone in the world uses this, how does it change human behavior?
  5. What’s the most evil thing that can be done with this?

State the research question.

It’s important to state the question in language that everyone can understand. The bitly team comes from a variety of scientific and business backgrounds, and we’ve developed some of our own common vocabulary, but it still takes a bit of effort to make sure that everyone understand the fundamental challenge and why it’s interesting.

How do we know when we’ve won?

Here we define the metrics that we’ll use to measure our success. For some questions, this is obvious, and for others it’s impossible to define — we can at least acknowledge that ahead of time.

Assume we’ve solved this question perfectly. What are the first things that we’ll build with it?

This question allow us to assess the potential business and product impact. What capabilities will we have with this that we don’t have now? It allows us to keep the long-term research vision in mind while still optimizing for shorter-term opportunities.

If everyone in the world uses this, how does it change human behavior?

What’s the maximum potential impact of this work? If it’s not inspiring, is it worth pursuing at all?

What’s the most evil thing that can be done with this?

I don’t ask this question to encourage evil (>:]) but as a creative tool for expanding how we think about validity, impact, and potential applications of the research. The label evil is so ridiculous that it permits people share their craziest ideas. Plus, it’s always a fun conversation to have.

Finally

I’m always revising this list, and I would love to hear how you think about prioritizing your work.