What Mugshots Mean For Public Data


The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion. These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime.

What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it.

Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you could physically go to the various records offices, sometimes in each town, to request information about them. Given that there are ~20,000 municipalities in the United States, just doing a check would take the unreasonable investment of days.

Before mugshot sites, you had to actually visit each state’s database, figure out how to query it, and assemble the results. Now we’re looking at an investment of hours, instead of days. It’s possible, but you must be highly motivated.

Now you just search, and this information is there. It is just as public as it was before, but the cost to access has become a matter of seconds, not hours or days, and we could imagine that you might be googling your date to find something else about him and instead stumble on the mugshot image. The cost for accessing the data is so trivial that can come up as part of an adjacent task.

The debate around fixing this problem has focused on whether the data should be removed from the public entirely. I’d like to see this conversation reframed around how we maintain the friction and cost to access technically public data such that it is no longer economically feasible to run these sorts of aggregated extortion sites while still maintaining the ability of journalists and concerned citizens to explore the records as necessary for their work.

  • http://thenoisychannel.com/ Daniel Tunkelang

    This debate only reinforces what Jim Adler has been preaching for a while: we need to regulate data use rather than data access.


  • http://www.hilarymason.com Hilary Mason

    On a related note, I just read Matt Waite’s article on building a conscientious mug shot service: upvot.es/1ckBTny

  • marcoscarreira

    I’m somewhat skeptical that friction and cost will deter those who see this as a business if the cost is still low enough that journalists and/or citizens can access it. I think that at some point countermeasures will have to be developed (either regulation against profiteering from this information or sites that reinforce positive reputations).

  • Paul

    Why should this data be public? It is private data about someone and not really the business of anyone else, no? What am I missing?

    • Ken

      They have entered the public arena by being arrested for a crime. The problem is that the crime may be very minor or they may also not be found guilty or even go to trial. Even if they are convicted it may be an offence where any other public records are removed after a period of time.

      I don’t mind that for certain crimes that information is available for ever. Pedophiles, rapists, murderers, yes, smoking dope no.

    • Joel Grus

      Indeed, it seems like the root problem here is that mugshot data is being made publicly available for people who haven’t been convicted of crimes. Being arrested is a far cry from being guilty of something.

      I’m less concerned about the right “friction” around accessing these data, and more concerned about what “public interest” is served by making these available in the first place.

    • Nicholas Doiron

      Because if police arrest records were hidden from the public, they wouldn’t be accountable for whom they arrest or why.

  • Joni

    Good point!

  • http://meloncholy.com/ Andrew Weeks

    You’re definitely right about friction, but I think context also matters a lot. (Because, of course, a lack of friction is exactly what’s let so much cool stuff exist on the web, and we all benefit hugely from that.)

    Helen Nissenbaum has done a lot of great work on this subject

    The standard explanation for privacy freakouts is that people get upset because they’ve “lost control” of data about themselves or there is simply too much data available. Nissenbaum argues that the real problem “is the inapproproriateness of the flow of information due to the mediation of technology.” In her scheme, there are senders and receivers of messages, who communicate different types of information with very specific expectations of how it will be used. Privacy violations occur not when too much data accumulates or people can’t direct it, but when one of the receivers or transmission principles change. The key academic term is “context-relative informational norms.” Bust a norm and people get upset.

    Source: http://www.theatlantic.com/technology/archive/2012/03/the-philosopher-whose-fingerprints-are-all-over-the-ftcs-new-approach-to-privacy/254365/

    Another example would perhaps be the Girls Around Me app: here women were happy sharing updates on Facebook and on FourSquare (and perhaps happy for these to be shared publicly), but were pretty horrified to see them used in this rather different context.

    I don’t believe Nissenbaum has answers (and I certainly don’t), but it’s going to become an increasingly important discussion.

    • http://meloncholy.com/ Andrew Weeks

      Though, that said, understanding a site’s / an app’s context and information norms sounds like a very interesting ML problem.

  • cheers
  • Ken

    The last part of the story is important. Google has altered their pagerank to close to zero. Credit card companies and paypal wont accept payments for them. Presumably too many people in America have a minor arrest, so it is good business practice to look after them.

  • Mike Sokolov

    If you can solve that problem (add friction while maintaining public accessibility), you can also fix content piracy for the entertainment and publishing industries, no?

  • geargrinder

    This is certainly something that needs to be debated (do we do that any longer?).

    But I am concerned by your last sentence: “…maintaining the ability of journalists and concerned citizens to explore the records as necessary” leaves room for interpretation by some gatekeeper who “knows better.” Too often that turns into “only those who agree with me.”

  • Pingback: First they came for the mugshot websites, but I said nothing… | 8ballbilliard

  • Jonathan Hochman

    The single question to ask is whether the publisher asks money to take down information. That is always uncool. If paid un publishing were illegal, these abuses would cease.

  • marcoscarreira
  • Pingback: » Google Mugshot, l’algoritmo che ti salva la reputazione - AWD SeoWeb

  • progretarian.

    Most of the things discussed try to adress symptoms, whereas you only get rid of such things by dealing with the cause.
    And the cause is not a web 2.0 issue of grey data, but a legal glitch that’s being exploited. In most European jurisdictions such a mugshot Google scenario wouldn’t even be possible, let alone legally making money with it. It would get you a sentence for privacy violations and extortion.

  • tb

    Could you design a computer program that finds mug shots of people that look like you? I’m thinking something like google image search, but only using mug shots. I would like to know who my criminal dopplegangers are.