Where’s the API that can tell me that this photo contains a puppy and a can of Coke?

puppy and a can of coke

Photo by Ahmad van der Breggen on Flickr.

We’ve gotten very good at extracting and disambiguation entities from text data. You can license a commodity system, and there are API and even open source tools that work fairly well.

However, a large percentage of content that people share is not primarily text (a back-of-the-envelope guess says around 18%), and we currently have very little automated insight into that content.

I know this is a very hard problem, but I’m continuously surprised by how few people seem to be working on it. Any ideas?


  • http://twitter.com/amy8492 Amy

    https://www.iqengines.com/ automatic image recognition + crowdsourcing

  • http://www.facebook.com/jaymz.campbell Jaymz Campbell

    It’s not the same but might be an interesting read for you, these these stack-overflow/math posts were really interesting in terms of quite focused pattern/feature identification – worth a skim.

    http://stackoverflow.com/questions/8479058/how-do-i-find-waldo-with-mathematica
    http://mathematica.stackexchange.com/questions/11819/help-find-a-bright-object-on-mars

  • http://twitter.com/jasebell Jason Bell

    Apart from crowdsourcing the info manually the next step would be to mine EXIF data from the image itself. Fine if a professional photographer took it, marked it and populated the caption data. So something like Apache Tika would fit the bill if the meta data was stored somewhere.

    After that, well it’s question I’ve asked myself for a long time.  The API’s for Flickr and Instagram would be the ones to watch. But it’s down to the basic “garbage in garbage out”, when it comes to tagged information. It’s only as good as the person that put it there.

  • Anonymous

    It’s been mentioned down here in the comments already, but this seems to me like the type of problem that’s better suited to humans than it is to algorithms. I’m sure one could write some sort of recognition algorithm tuned to recognize Coke cans (or puppies), e.g: http://www.slate.com/blogs/future_tense/2012/06/27/google_computers_learn_to_identify_cats_on_youtube_in_artificial_intelligence_study.html

    But personally, writing individual algorithms designed to identify every single possible thing in a photo that you’d want to identify seems like a lot of work and tweaking when you could just, well, ask someone.
    http://voxilate.blogspot.com/2009/10/batching-mechanical-turk-jobs-at.html

    • http://www.hilarymason.com Hilary Mason

      Unfortunately, we’re at a scale where even mechanical turk is unaffordable. I would love to find a hybrid system, that uses human tagging when algorithms fail and then uses that data to retrain new classifiers, but I haven’t found it yet!

      • Emery Berger

        Hi Hilary,

        In principle, this sounds like a great fit for AutoMan (quite refined now, just appeared at OOPSLA 2012) – automan-lang.org. My student is busy on something else right now, but wrapping it with a REST API is on his list of things to do. The beauty of AutoMan is that it automatically handles quality control, so you get high confidence ground truth, which you could then use to train up qualifiers.

        – emery

  • John

    I point Google Goggles at the picture, and I get “Yorkie Dogs” and “Coca-Cola”.  Now just convince Google to make that available as an API. :)

  • http://twitter.com/gordonrios Gordon Rios

    This looks interesting: http://tineye.com/

  • Pingback: Obama-Romney Facial Recognition | Continuum Analytics

  • George Purkins

    Curalate does this for brands: http://www.curalate.com/

  • Matt Mcknight

    We’ve done some work with piXlogic in this arena. http://www.pixlogic.com/

  • Dave

    Mobileworks, Crowdflower, Mechanical Turk, and Samasource all have APIs that can handle image tagging work. 

  • Matthew Krieger

    Mechanical Turk is that API for now.

  • Doug

    In a perfect world, are you looking for an answer to, “Does this picture have a dog in it?” or “What’s in this picture?” Assuming the latter, are you looking for some sort of importance measurement too? e.g. “What are the 3 most prominent features of this picture?” Or the whole kit-and-kaboodle: “What are the 1000 words that this picture is worth?”

    In retrospect, probably a silly question considering the “data scientist” facet, but I’ve already typed this, and am now in mid-mind-churn.

  • Anonymous
  • Sergey Vershinin

    Google’s search by image gives a best guess of “coke can puppy”.  Don’t know if the results can be scraped, but I don’t see why not.

  • Siderite Zackwehdex

    There are all sorts of possible problems, and I don’t mean technical. Remember Facebook with their idea to tag pictures with the names of the people in them based on face recognition and the privacy backlash that ensued.
    I doubt people would look for dog and Cola more than for naked chicks (you know, those just hatched and featherless)

  • Jason Bliss

    Amazon’s mechanical turk api