What Mugshots Mean For Public Data


The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion. These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime.

What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it.

Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you could physically go to the various records offices, sometimes in each town, to request information about them. Given that there are ~20,000 municipalities in the United States, just doing a check would take the unreasonable investment of days.

Before mugshot sites, you had to actually visit each state’s database, figure out how to query it, and assemble the results. Now we’re looking at an investment of hours, instead of days. It’s possible, but you must be highly motivated.

Now you just search, and this information is there. It is just as public as it was before, but the cost to access has become a matter of seconds, not hours or days, and we could imagine that you might be googling your date to find something else about him and instead stumble on the mugshot image. The cost for accessing the data is so trivial that can come up as part of an adjacent task.

The debate around fixing this problem has focused on whether the data should be removed from the public entirely. I’d like to see this conversation reframed around how we maintain the friction and cost to access technically public data such that it is no longer economically feasible to run these sorts of aggregated extortion sites while still maintaining the ability of journalists and concerned citizens to explore the records as necessary for their work.

20 Comments on “What Mugshots Mean For Public Data”

  1. This debate only reinforces what Jim Adler has been preaching for a while: we need to regulate data use rather than data access.


  2. Hilary Mason says:

    On a related note, I just read Matt Waite’s article on building a conscientious mug shot service: upvot.es/1ckBTny

  3. marcoscarreira says:

    I’m somewhat skeptical that friction and cost will deter those who see this as a business if the cost is still low enough that journalists and/or citizens can access it. I think that at some point countermeasures will have to be developed (either regulation against profiteering from this information or sites that reinforce positive reputations).

  4. Paul says:

    Why should this data be public? It is private data about someone and not really the business of anyone else, no? What am I missing?

    • Ken says:

      They have entered the public arena by being arrested for a crime. The problem is that the crime may be very minor or they may also not be found guilty or even go to trial. Even if they are convicted it may be an offence where any other public records are removed after a period of time.

      I don’t mind that for certain crimes that information is available for ever. Pedophiles, rapists, murderers, yes, smoking dope no.

    • Joel Grus says:

      Indeed, it seems like the root problem here is that mugshot data is being made publicly available for people who haven’t been convicted of crimes. Being arrested is a far cry from being guilty of something.

      I’m less concerned about the right “friction” around accessing these data, and more concerned about what “public interest” is served by making these available in the first place.

    • Nicholas Doiron says:

      Because if police arrest records were hidden from the public, they wouldn’t be accountable for whom they arrest or why.

  5. Joni says:

    Good point!

  6. Andrew Weeks says:

    You’re definitely right about friction, but I think context also matters a lot. (Because, of course, a lack of friction is exactly what’s let so much cool stuff exist on the web, and we all benefit hugely from that.)

    Helen Nissenbaum has done a lot of great work on this subject

    The standard explanation for privacy freakouts is that people get upset because they’ve “lost control” of data about themselves or there is simply too much data available. Nissenbaum argues that the real problem “is the inapproproriateness of the flow of information due to the mediation of technology.” In her scheme, there are senders and receivers of messages, who communicate different types of information with very specific expectations of how it will be used. Privacy violations occur not when too much data accumulates or people can’t direct it, but when one of the receivers or transmission principles change. The key academic term is “context-relative informational norms.” Bust a norm and people get upset.

    Source: http://www.theatlantic.com/technology/archive/2012/03/the-philosopher-whose-fingerprints-are-all-over-the-ftcs-new-approach-to-privacy/254365/

    Another example would perhaps be the Girls Around Me app: here women were happy sharing updates on Facebook and on FourSquare (and perhaps happy for these to be shared publicly), but were pretty horrified to see them used in this rather different context.

    I don’t believe Nissenbaum has answers (and I certainly don’t), but it’s going to become an increasingly important discussion.

  7. Ken says:

    The last part of the story is important. Google has altered their pagerank to close to zero. Credit card companies and paypal wont accept payments for them. Presumably too many people in America have a minor arrest, so it is good business practice to look after them.

  8. Mike Sokolov says:

    If you can solve that problem (add friction while maintaining public accessibility), you can also fix content piracy for the entertainment and publishing industries, no?

  9. geargrinder says:

    This is certainly something that needs to be debated (do we do that any longer?).

    But I am concerned by your last sentence: “…maintaining the ability of journalists and concerned citizens to explore the records as necessary” leaves room for interpretation by some gatekeeper who “knows better.” Too often that turns into “only those who agree with me.”

  10. […] sci­en­tist at Bitly, noted in a blog post, mugshot sites take advan­tage of infor­ma­tion that is in a weird kind of public-private gray area: it is the­o­ret­i­cally pub­lic, and comes from offi­cial sources, but in the past it was […]

  11. Jonathan Hochman says:

    The single question to ask is whether the publisher asks money to take down information. That is always uncool. If paid un publishing were illegal, these abuses would cease.

  12. marcoscarreira says:

    Here’s the take from Megan McArdle (Bloomberg): http://www.bloomberg.com/news/2013-10-09/mugshots-shouldn-t-be-extortion-bait.html

  13. […] scrive Hilary Mason, i dati presenti in questi siti sono in una sorta di “zona grigia”: sono pubblici, nel […]

  14. progretarian. says:

    Most of the things discussed try to adress symptoms, whereas you only get rid of such things by dealing with the cause.
    And the cause is not a web 2.0 issue of grey data, but a legal glitch that’s being exploited. In most European jurisdictions such a mugshot Google scenario wouldn’t even be possible, let alone legally making money with it. It would get you a sentence for privacy violations and extortion.

  15. tb says:

    Could you design a computer program that finds mug shots of people that look like you? I’m thinking something like google image search, but only using mug shots. I would like to know who my criminal dopplegangers are.