Taxonomy For Fun And (Google's) Profit?

By: Andrew Goodman

Google Image Labeler is eliciting intelligent commentary around the virtual campfire, as one might expect.

It seems Google needs to improve the quality of its Image Search by tagging the images. What better way to go about it than luring an army of volunteer taggers? Hey, where have we heard this story before? Remember ODP?

Accurately describing elements of an image in few words isn't as complex as editing directory categories.

Today, sites like Flickr and Youtube thrive on tagging. First, contributors of uploaded images, and later, other members of the community, tag their material as well as they can. It's a rough and ready form of classification that's attracted much interest and much pro & con, parallel with general debate over whether Web 2.0 is really anything, let alone an advance over what came before.

Well, it is an advance, or Google wouldn't be doing this. Tags help users find images, there's no doubt about that.

And now begins the great experiment with different incentive systems and value systems. It looks as if properties like Flickr and Youtube have pretty accurate taggers, perhaps because those engaged in tagging genuinely get it and are genuinely trying to be helpful. At this juncture, by contrast, Google seems to be running into the odd problem with insincere and malicious taggers, at least if the "editorial comment" type tags I'm seeing on Google Video are any indication. But the random "double-verification" approach to tagging is ingenious compared to hierarchical command-and-control systems. Where editors and their "bosses" know one another and can rig up a corruption scheme, this system seems to pair editors up with people they don't know and cannot know. That isolates cheaters, Panopticon-like. I'm going to give it a try, just to check it out.

If accurate tagging requires the equivalent of professional editorial staff, but you're running it like a kind of community effort involving nebulous rewards, because professional staff could never get to everything... it seems likely that odd usage/contribution patterns will arise, as they have before. In ODP, there were "meta" editors and high-output editors who developed expertise and did much more work than most of the rest, but also ran the risk of developing blinders of sorts. *Why* did they do so much more than others?

When it comes to Wikipedia, the same phenomenon has occurred. The "spontaneous outpouring of community input" is driven by a cadre of prolific editors, followed by a long tail of occasional helpers. What does it all mean? I'm not sure, except that it speaks to the competitiveness of some people, even when trying to win at something that doesn't really benefit them, and benefits a "community" in a way that is yet unproven.

In this case the mega-taggers probably can't wreck anything -- especially with the random competitive tagging method tied to points -- so the end result is better search. If Google Video tags currently stink, they can perhaps assign "points" to those folks who want to go in and clean up all those tags too. Google, of course, profits, but there is a certain inherent fascination with watching something work better as taggers get involved. Then again, I'm not 100% sure it's worth anyone's time to accurately tag a Japanese teenager singing karaoke Barbie Girl.

We debated this subject here way back with the ODP case. To get truly professional editorial results consistently, in some cases you have to pay people; in other cases, you don't. With a poorly-thought-out incentive system (quality depends on commitment and skill level as well as incentives and sanctions for bad behavior), alternative (corrupt) compensation schemes can arise.

So, some thinking had to go into it. Google doesn't have a real "vertical" or "spontaneous face to face society" feel to it, but it does of course have the advantage of a lot of money and a willingness to experiment with various filtering and incentive systems. So - it looks like a sawoff. They can find a way to overcome the shortcoming of their bigness.

Either way, tagging is moving search forward. Probably the most intriguing nascent tagging experiment, for me, is Amazon's. Books are being tagged as we speak, first by authors, then by prolific reviewers... and... later, everyone else? Or not? Regardless, the result seems to be a parallel form of taxonomy that arises spontaneously out of community effort (assuming reasonable expertise in the community), as opposed to getting the Library of Congress category right, or some other method that might have existed in the past. From a tag, bringing up all known books about "beanstalks" *tagged as such* is only a click away. That's not the same as doing a raw keyword search for beanstalks. Tagging is shades of past information science efforts, obviously, but it's happening here and now in a specific kind of way, and it would be a mistake to dismiss its impact.

One more thought: vis-a-vis PageRank and anchor text... hasn't linking always been like tagging? It's a mistake to say that Google eschewed metadata because they didn't look at meta keyword tags. They were just looking at different tags, and still do. :) For a long, long time, a high proportion of website publishers voluntarily "tagged" their links with something a little more informative than "click here"... just because the web gurus said it was a good thing to do.

Edit: after playing the "game," I ran across this excellent post on O'Reilly Radar, which explains that Image Labeler is based on Prof. Louis von Ahn's "ESP Game". On Search Engine Watch, Danny Sullivan confirms this in a Postscript, having heard back directly from Prof. van Ahn. As an aid to tagging images, it's clear to me as a player that the type of "ESP" that is involved in playing the game optimally is not going to lead all by itself to the kind of thorough tagging we see on other sites. The best way to get the most points is to match your partner's labels as many times as possible in a timed session. And the only way to do that is to quickly type in the least complex words possible. Sure, Google might tuck away your unmatched, more complex words, but to get the most points, you and your partners will soon learn that you should aim for the least complicated word possible to describe some part of the photo: eg. ocean, sky, people, woman, man, office, desk, etc. Screen shots of something complicated, such as a spreadsheet, are most easily matched when partners type in the heading of a column or any prominent word in the screenshot. A complex (but known) type of logo will be best matched with your partner if you both type in "logo." And so on.

On a final, final note: I suppose "tagging" is slang for "graffiti." This kind of tagging is something like the opposite of graffiti, especially when the sober, straight taggers are assigned to clean up the "Google Video Graffiti."


About the Author:
Andrew Goodman is Principal of Page Zero Media, a marketing consultancy which focuses on maximizing clients' paid search marketing campaigns.

In 1999 Andrew co-founded, an acclaimed "guide to portals" which foresaw the rise of trends such as paid search and semantic analysis.

