Getting Started with Data Science

I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them:

The best way to get started in data science is to DO data science!

First, data scientists do three fundamentally different things: math, code (and engineer systems), and communicate. Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities.

Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind, that need data skills put to work for good. No matter how much of a beginner you might be, your enthusiasm will be appreciated, you’ll learn things, and you’ll meet great people. And if you can’t find a physical meetup close to you, start one, or join the twitter discussion.

Third, put your projects out in public. Share them on Github, your blog, and Twitter. Explain why you thought the question was interesting, where you got the data (and good data is everywhere), and how you came to a conclusion. It doesn’t have to be perfect. A couple examples of data projects motivated by nothing more than the author’s curiosity are  Yvo’s TechCrunch analysis and Drew and John’s Ranking the Popularity of Programming Languages.

Finally, you can start right here. What advice do you give? What great projects have you seen lately? Share them in the comments.


  • http://www.trejdify.com/ Trejdify

    I believe that Kaggle (http://www.kaggle.com) is a good way to start. The probability that you make money from Kaggle when you are new is small, but you get a little bit more motivated by the competition.

  • Bill Campbell

    http://www.coursera.org is a great resource for personal and professional development in a variety of areas including computing and data science.  The courses are high quality and free.

    Here is a link to a course targeted specifically for beginning data scientists:

    https://www.coursera.org/course/datasci

     I am an avid user of the site and I am not in anyway affiliated with any of coursera’s business objectives (i.e., this is not spam) .

    • David Katz

      Hilary – great post! I’ve been meaning to get back into data science. I’ve definitely fallen behind on the coding side. 
      Bill – Thanks for the links! Have you tried anything on iTunes U before? Any other suggestions?

      • Bill Campbell

         David, I was not aware of ITunes U.  I will check it out, thanks.  There are other web based learning sites out there with varying content quality. A couple of notable ones include:

        http://www.udacity.com/
        https://www.khanacademy.org/

        I like coursera.org because the course quality is consistently very high. Although I have only sampled a small percentage of coursera’s offerings I have confidence that the coursera.org founders set and maintain strict quality standards.  I believe the quality I’ve experienced thus far is representative of all  the courses offered.

        My reply to this blog post was to provide a pointer to a beginning data science course, and not necessarily to promote online learning.  I hope Hilary forgives the topic divergence.   I get a lot of value following Hilary online (especially on twitter).  She is a great source for a lot of data science info, including pointers to data sets to play with.

  • Corey Chivers

    Cool! Thanks, Hillary. A suggestion I would give is to check out weekend hackathons in your area. There tends to be an abundance of app-dev types at these events, and a dearth data/analysis/statistical hackers.  My approach has been to offer my skills as a data scientist when joining a team. You might be surprised just how popular you will be!

  • http://www.treasalynch.com/blog Treasa

    A couple of things I would say. 1) there are a lot of datasets available to the world now via open data initiatives. Amazon AWS has some public datasets as does Windows Azure. Both the US and European stats agencies put a lot of data out there. It’s worth spending time wandering around large amounts of data and getting a feel for what it looks like and whether you want to get more value out of that data. Given the massive amount of resources out there, I would also look at focussing on a few activities at first. In addition to any reasonable maths and programming skills, I recommend some decent stats training (pretty sure coursera and iTunes U have stats related 101s worth looking at. Also, data science is a dynamic field – worth recognising whether particular aspects interest you more than others. The more you look at the data, I think, the more you will recognise what skill sets you need to add. 

  • Pingback: Sunday data/statistics link roundup (12/30/12) | Simply Statistics

  • Pingback: Getting Started with Data Science « Another Word For It

  • Pingback: Software engineer’s guide to getting started with data science « Statistics

  • petergul

    Have you seen this guide? http://www.rcasts.com/2012/12/software-engineers-guide-to-getting.html

    Helpful?

    I need a “Statistician’s guide to getting started with software engineering”.

  • Pingback: Matplotlib Basemap Tutorial: Interesting datasets to explore | peak 5390

  • Pingback: Pick of the week links for programmers « Cogitas Blog

  • Richard Dunks

    I really enjoyed this post and it spared me having to send you an email or corner you after a DataGotham event.  It inspired me to blog about the same topic from the learner’s point of view.  http://wp.me/p2PLpM-Q3

  • Pingback: What skills does a data scientist need? « Adventures in data

  • Sean Gonzalez

    I recommend attending “Recommendation Systems in The Real World”, hosted by Data Science DC (http://www.meetup.com/Data-Science-DC/events/94856042/).  Get to know 150 of your local DC Data Scientists, and have enjoy Data Drinks with everyone afterwards!

  • Robin Carnow

    For people who need a math refresher check out the following 
    Statistics course at http://www.openintro.org/.  They also have a free text book: http://www.openintro.org/stat/textbook.php 

    Linear Algebra course at http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/

  • Pingback: Data Science and Why Dogs Rule the Internet - GEEKKENYA

  • Pingback: Data Scientists and Statisticians: Can’t We All Just Get Along | Statistical Research