Thing 8 of 23: the tags don’t work

On 23 Things we were asked (last week!) to think about the use of tags (in the sense of informal user-created labels on internet objects). One of the problems with discussing tagging is that the same technology is used for tagging very different kinds of objects, in collections that vary widely in size, and can be used both by creators of a particular object and other users. Trying to generalise about these is tricky, so I want to look at a few particular cases.

Let’s start with my blog, which has around 340 posts. If you’re looking for a specific topic, such as iGoogle or naked monks, then it’s best to search the blog, because there’s only one blog post which refers to each (although there’s now also this post as well). I use tags only as broader based groupings – for example, to group all my posts on a particular conference or on themes such as ‘US politics’, where I may not use the specific phrase in the post. It’s partly for that reason that I have a medieval’ tag, but not a ‘Carolingian’ one, because almost all my discussions of the Carolingian empire will use the term and so can be found via searches.

Even though I keep my tagging so simple and I’m an experienced librarian, my tagging of entries is still not of particularly high quality and suffers biases. I’m inconsistent about the use of the tag ‘religion’, as contrasted with specific religions, such as Christianity. I have a tag for ‘homosexuality’, but not ‘heterosexuality’, even though I discuss both. And I periodically discover that there are useful themes that I haven’t tagged, and either have to go back and update the tags, or decide that it’s too big a task (as with tagging things as ‘Carolingian’).

My tagging doesn’t need to be very good, because I’m tagging objects that already have lots of searchable text. In contrast, tagging non-textual objects, such as images or bookmarks, is a lot harder, and the quality of tagging is very variable. Take something as simple and definite as a place name. I found a couple of pictures on Flickr from the tiny Sussex village where I grew up. One is tagged ‘bonfire, bonfire night, fireworks, madehurst’. The other is tagged ‘d40, 18-55mm f/3.5-56G, lenstagged, unmodified, 20081001, madehurst, church, madehurst church, west sussex, england, uk, 200810, 3008×2000’. If I wanted to find photos from West Sussex, I’d only find the first one by knowing that Madehurst was in West Sussex (and very few people have ever heard of Madehurst). And I suspect there are pictures on Flickr from Madehurst that haven’t been tagged or captioned with a place, and that therefore I can’t find at all.

Why then, is there such enthusiasm by some internet gurus for tags? One of the articles we were pointed to for this week was Clay Shirkey’s Ontology is overrated: categories, links, tags . Shirkey has two main points. One is that methods of formal subject categorisation doesn’t work for something as big and varied as the internet. I’m working with Library of Congress Subject Headings myself at the moment in my job, and I know their many weaknesses. But it’s perfectly possible to admit that formal classification schemes often don’t work effectively, but still point out that informal tagging has even more problems with inconsistency and inadequacy.

Shirkey’s false step seems to me to be assuming that you can somehow generate adequate forms of categorisation by aggregating poor forms of categorisation. I take this to be a variant of the ‘wisdom of crowds’ approach: that averaging the views of people can sometimes give a better answer than any individual one, as for example, when guessing the weight of a cake. Unfortunately, aggregating answers only really works when people have similar levels of knowledge. If you need to ‘ask the audience’ in Who wants to be a millionaire you’ll almost certainly get the right answer for one of the early questions. For the million-pound question, they’re unlikely to be much help.

In the same way, Shirkey is wrong to claim that ‘As long as at least one other person tags something the way you would, you’ll find it’. Eventually, maybe. But if there are 241 delicious bookmarks for Edward II, how do you plod through them to find the ones about the king, as opposed to the play or the “mutant calypso/reggae/African style English dance band”? Or Edward Wells II?

Shirkey starts with a contrast between Yahoo’s attempts at categorisation and Google’s lack of hierarchy. But Google doesn’t actually make much use of tags: it uses hyperlinks and clusters of interest. A site is ‘about’ something not just because of terminology within it, but because lots of other sites point to it. When you start looking at the useful forms of recommendations in large systems, they don’t predominantly work on tags, they work on such clusters of interest. Amazon and Library Thing’s recommendations are based on the fact that the people who buy or own one book also buy or own similar books. Delicious seems to work best when you can find a person whose interests mean they bookmark the kind of sites you’re interested in, even if they tag them slightly differently. Flickr’s ability to create pools of pictures can link together specific themes more effectively than tags.

The message I’d draw is that people are often poor at labelling things, but they’re a lot better at knowing what they like or find useful. Should librarians be using tags? They may have a limited role in blogs or on social media sites, but I’m not convinced they’re the right way forward for library catalogues. Why should we make users do the work of tagging, when we can provide far more useful information for them automatically via a people who borrowed this also borrowed that button? (At the University of Huddersfield Library, they’ve got even more whizzy tricks than this, thanks to Dave Pattern). Tagging for yourself may make sense if your needs are simple: using other people’s tags is often a waste of time.

4 thoughts on “Thing 8 of 23: the tags don’t work

  1. This is a really interesting post, thank you! I do think there’s a role for user tagging alongside the formal catalogue, but agree that the recommendation system would be even more useful.

    Like

  2. Very interesting post, Magistra.

    I really enjoyed reading about Dave Pattern’s work at the University of Huddersfield Library, there are some really good ideas there using the vast amounts of data we already have (even if we don’t know it yet).

    Like

  3. This made me re-think my fondness for tags. I now suspect (read: guess) that tags will ultimately prove most valuable to other people when we start treating them as features for recommendation systems, rather than just doing brute-force searching on them. And some of the techniques (“latent semantic indexing” etc.) that help for text processing could also finesse the Madehurst/West Sussex issue.

    Also: thanks for linking to David Pattern’s very cool stuff.

    Like

    • I don’t know how much research has been done on tagging yet (though another cam23 participant pointed out an old but interesting article. Intuitively, I’d expect tagging to work better for scientific/technical topics (which tend to be about something very specific), rather than the more nebulous concepts in many discussions of the humanities. And the ability of people to tag well is also going to vary a lot, because it’s essentially an analytical operation (what combination of things is this about?) and that kind of analytical thought, which comes very naturally to some people, is much harder for others. One of the things it would be very interesting to research is whether people who tend to use very limited tags would be able to provide more useful keywords if they instead wrote a free-form description of a resource, i.e. whether they find tagging hard because of its intellectual activity or its specific format.

      But you’re still stuck with the problem of scalability: if you have 100 resources and you use 10 tags, you probably have at most 20-30 things for every tag, and that’s manageable to browse through. If you then expand your collection to 1000 things, you either have to change your tags or you’re looking at a couple of hundred things with the same tag, and that’s losing its usefulness. There are some things tags probably do make sense for, but we need to think harder about when and how we use them.

      Like

Leave a comment