Need Data? Start Here

Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone.

Have one to add? Let me know!

(I’ve shared the bundle before, but this post can act as unofficial homepage for it.)


  • http://twitter.com/dip4fish Jean-Patrick Pommier

    Thank you,
    Multispectral cytogenetic images (MFISH) are available here https://github.com/jeanpat/MFISH

  • http://blog.grapesmoker.com/ Jerry Vinokurov

    Nice!

  • http://twitter.com/amy8492 Amy

    Yeah, thanks.I’ll probably go for the 2gb of cats when i’ll have time :-) .

  • what_i_am_thinking_rightnow

    Hello Miss Hilary Mason, What book would you recommend for complete newbies to get started in Data Science and Predictive Analytics?

    Thank You!

  • http://pafnuty.wordpress.com/ Aman

    Thanks for this list, Hilary. 

    I started my own recently (http://eda.fenristech.com/PublicDataSets) and will have to add to that a couple from your list that are especially interesting to me. :)  

  • Jonathan Cachat

    I am curious what you would do with massive collection of heterogenous scientific data – say http://www.neuinfo.org

  • http://partiallattice.wordpress.com/ Daniel Smith

    A source of health expenditure data is the MEPS survey:

    http://meps.ahrq.gov/mepsweb/

  • Arturo

    The main source of data on microfinance (financial and social indicators) is http://www.mixmarket.org  No password is required,