Need Data? Start Here

Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone.

Have one to add? Let me know!

(I’ve shared the bundle before, but this post can act as unofficial homepage for it.)

13 Comments on “Need Data? Start Here”

  1. Thank you,
    Multispectral cytogenetic images (MFISH) are available here

  2. Amy says:

    Yeah, thanks.I’ll probably go for the 2gb of cats when i’ll have time :-).

  3. what_i_am_thinking_rightnow says:

    Hello Miss Hilary Mason, What book would you recommend for complete newbies to get started in Data Science and Predictive Analytics?

    Thank You!

  4. Aman says:

    Thanks for this list, Hilary. 

    I started my own recently ( and will have to add to that a couple from your list that are especially interesting to me. :) 

  5. Jonathan Cachat says:

    I am curious what you would do with massive collection of heterogenous scientific data – say

  6. Daniel Smith says:

    A source of health expenditure data is the MEPS survey:

  7. Arturo says:

    The main source of data on microfinance (financial and social indicators) is  No password is required,

  8. datapants says:

    Hi Hilary. How ’bout collecting all your datasets and wearing them in your datapants. Would you be interested in the domain name, or alan@nothing.con

  9. Guest says:

    I am going to use that belly button biodiversity dataset!

  10. John Yetter says:

    I am going to use that belly button biodiversity dataset! I will probably take a look at the loan dataset, too, but it is not as fun.

  11. kekline says:

    Very nice! I also point people to this post on Quora where a ton of public, governmental and NGO data sets cataloged:

  12. Martin says:

    I’m looking a dataset formatted for MongoDB? Is anyone aware of somewhere i could find something?

    Thank you!