Getting Started with Data Science

I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them:

The best way to get started in data science is to DO data science!

First, data scientists do three fundamentally different things: math, code (and engineer systems), and communicate. Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities.

Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind, that need data skills put to work for good. No matter how much of a beginner you might be, your enthusiasm will be appreciated, you’ll learn things, and you’ll meet great people. And if you can’t find a physical meetup close to you, start one, or join the twitter discussion.

Third, put your projects out in public. Share them on Github, your blog, and Twitter. Explain why you thought the question was interesting, where you got the data (and good data is everywhere), and how you came to a conclusion. It doesn’t have to be perfect. A couple examples of data projects motivated by nothing more than the author’s curiosity are  Yvo’s TechCrunch analysis and Drew and John’s Ranking the Popularity of Programming Languages.

Finally, you can start right here. What advice do you give? What great projects have you seen lately? Share them in the comments.


19 Comments on “Getting Started with Data Science”

  1. Trejdify says:

    I believe that Kaggle (http://www.kaggle.com) is a good way to start. The probability that you make money from Kaggle when you are new is small, but you get a little bit more motivated by the competition.

  2. Bill Campbell says:

    http://www.coursera.org is a great resource for personal and professional development in a variety of areas including computing and data science.  The courses are high quality and free.

    Here is a link to a course targeted specifically for beginning data scientists:

    https://www.coursera.org/course/datasci

     I am an avid user of the site and I am not in anyway affiliated with any of coursera’s business objectives (i.e., this is not spam) .

    • David Katz says:

      Hilary – great post! I’ve been meaning to get back into data science. I’ve definitely fallen behind on the coding side. 
      Bill – Thanks for the links! Have you tried anything on iTunes U before? Any other suggestions?

      • Bill Campbell says:

         David, I was not aware of ITunes U.  I will check it out, thanks.  There are other web based learning sites out there with varying content quality. A couple of notable ones include:

        http://www.udacity.com/
        https://www.khanacademy.org/

        I like coursera.org because the course quality is consistently very high. Although I have only sampled a small percentage of coursera’s offerings I have confidence that the coursera.org founders set and maintain strict quality standards.  I believe the quality I’ve experienced thus far is representative of all  the courses offered.

        My reply to this blog post was to provide a pointer to a beginning data science course, and not necessarily to promote online learning.  I hope Hilary forgives the topic divergence.   I get a lot of value following Hilary online (especially on twitter).  She is a great source for a lot of data science info, including pointers to data sets to play with.

  3. Corey Chivers says:

    Cool! Thanks, Hillary. A suggestion I would give is to check out weekend hackathons in your area. There tends to be an abundance of app-dev types at these events, and a dearth data/analysis/statistical hackers.  My approach has been to offer my skills as a data scientist when joining a team. You might be surprised just how popular you will be!

  4. Treasa says:

    A couple of things I would say. 1) there are a lot of datasets available to the world now via open data initiatives. Amazon AWS has some public datasets as does Windows Azure. Both the US and European stats agencies put a lot of data out there. It’s worth spending time wandering around large amounts of data and getting a feel for what it looks like and whether you want to get more value out of that data. Given the massive amount of resources out there, I would also look at focussing on a few activities at first. In addition to any reasonable maths and programming skills, I recommend some decent stats training (pretty sure coursera and iTunes U have stats related 101s worth looking at. Also, data science is a dynamic field – worth recognising whether particular aspects interest you more than others. The more you look at the data, I think, the more you will recognise what skill sets you need to add. 

  5. […] post by Hilary M. on “Getting Started with Data Science”. I really like the suggestion of just picking a project and doing something, getting it out there. […]

  6. […] Getting Started with Data Science by Hilary Mason. […]

  7. […] are many articles on this subject from renowned data scientists (Dataspora, Gigaom, Quora, Hilary Mason). This post captures my journey (a software engineer) on learning Statistics and Data […]

  8. petergul says:

    Have you seen this guide? http://www.rcasts.com/2012/12/software-engineers-guide-to-getting.html

    Helpful?

    I need a “Statistician’s guide to getting started with software engineering”.

  9. […] take a look at her site.  If you are new to data science, you might want to start with here post Getting Started with Data Science.  She has a curated collection of interesting datasets […]

  10. […] Getting started with data science – advice from Hilary Mason from Bitly for those interested in data science. In a nutshell, “DO IT”. […]

  11. Richard Dunks says:

    I really enjoyed this post and it spared me having to send you an email or corner you after a DataGotham event.  It inspired me to blog about the same topic from the learner’s point of view.  http://wp.me/p2PLpM-Q3

  12. […] a few practicing data scientists, including Pete Skomoroch (Principal Data Scientist at LinkedIn), Hilary Mason (Chief Scientist at bitly), and Michael Driscoll (Chairman of […]

  13. Sean Gonzalez says:

    I recommend attending “Recommendation Systems in The Real World”, hosted by Data Science DC (http://www.meetup.com/Data-Science-DC/events/94856042/).  Get to know 150 of your local DC Data Scientists, and have enjoy Data Drinks with everyone afterwards!

  14. Robin Carnow says:

    For people who need a math refresher check out the following 
    Statistics course at http://www.openintro.org/.  They also have a free text book: http://www.openintro.org/stat/textbook.php 

    Linear Algebra course at http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/

  15. […] In an interview with Mashable, Mason talks about big data, curiosity and why colleges should rethink their computer science curricula. If reading her insights about data science creates a thirst for more data knowledge, check out some of Mason’s favorite data science blogs in this bitly bundle, and read her tips for getting into data science. […]

  16. […] Mason (of Bitly) shares her definition of a data scientist.  I suppose my definition differs from Hilary Mason’s data science definition.  Statisticians need to understand the science and structure of data, and data scientists need to […]

  17. Tristan says:

    The best way to get into data science is to practice. Find datasets on the internet to analyze. Checkout places like Kaggle.com or TeamLeada.com for practice problems. Show case your code on github etc. Just start “doing” data science. You won’t get better without trying it first!