What can the vulgus do? Crowd-sourcing for medievalists

In 802, Alcuin blamed a riot at Tours on the ‘untaught crowd [vulgus indoctum], who are always accustomed to do unsuitable things without counsel’. Recently, however, there’s been an increasing interest in the ‘wisdom of crowds’, and this year I’ve kept on coming across mentions of crowd-sourcing projects for historical purposes. Dan Cohen, in his Arcadia Lecture at Cambridge in April, mentioned several such projects. The closing plenary session at a recent conference on digital humanities brought another (Digital Bentham). This summer, Oxford University launched Project Woruldhoard, a follow-up to a project on making a community collection of material on World War I. Oxford presumably think that crowd-sourcing has potential for work on the Middle Ages, but how else might medievalists be able to use such techniques?

One problem with any discussion of crowd-sourcing is that it covers so many different things. So, rather than getting into the details of projects, which I don’t really know enough about to do, I want to try and identify some broad themes. At the small-scale level, there’s getting quick answers to your problems via your Twitter followers: what might be called comitatus-sourcing. (As an aside for medievalists, studies of Twitter have suggested that it’s more hierarchical and less reciprocal than other social media, which has interesting implications if we’re using it as a model for opinion forming).

A lot of historical crowd-sourcing projects are predominantly concerned with creating mass archives. Such projects existed even before the development of the internet, as seen in the Mass Observation project and BBC Domesday Project. But new social media technologies have made the process of creating and maintaining such archives much easier, with less chance of digital obsolescence. Dan Cohen’s Arcadia lecture talked about how rapidly he’d been able to set up a digital archive about the 9/11 attacks, and also about how the archive was now being used for purposes he’d never imagined at the time, such as for linguistic studies on teenspeak at the start of the twenty-first century.

It’s noticeable, however, that almost all these attempts at mass archiving have dealt with topics that can be seen as ‘people’s history’: oral history, local history or family history. That doesn’t mean to say that they’re only of interest to historians in these fields: an exercise such as the crowd-sourcing element of the BBC History of the World in 100 objects project can produce some material culture of interest to medievalists. But I’m still not clear who will respond to Project Woruldhoard. It seems to me to be pitched rather uneasily between academics (send us your lecture-notes) and the public (send us your living history/images).

A different approach to crowd-sourcing comes from attempts in various academic fields to make use of mass volunteers. Some of these projects have been very successful: Dan Cohen referred to Galaxy Zoo a project for classifying images of galaxies. A number of projects are trying such crowd-sourcing techniques within museums and historical projects. To name just a few, there’s the Victoria and Albert Museum asking visitors to choose the best images of objects, the Digital Bentham project for transcriptions of Jeremy Bentham’s writing, and lots of museum tagging projects.

As I’ve discussed before I’m unconvinced about the ability of tagging to produce good results. But my mind was changed somewhat by discovering Freebase, a project that aims at crowd-sourcing what are essentially authority files. That’s successful enough that Google has bought it.

That’s when it dawned on me: what you need for mass volunteer projects isn’t actually crowd-sourcing, but nerd-sourcing. You need to find, among the vast number of vaguely interested, not very analytical people who look at web sites, the small number of tidy-minded obsessives who care deeply about the ethnic origins of Freddie Mercury or want to analyse statistical data for fun and no profit. And then you need to persuade these people to do as much work for you as you can.

The success of mass volunteering, therefore is going to depend heavily on the number of well-informed enthusiasts ‘out there’. Dan Cohen mentioned a crowd-sourcing transcription project he was involved in: the papers of the early US War Department. He thought that the number of amateur historians interested in early US history meant that they would be able to get enough volunteers to do this effectively. The Library of Congress has also had a lot of success with its picture identification requests on Flickr.

In contrast, whether there are really enough Bentham enthusiasts to do transcriptions for the Digital Bentham project seems to me far more dubious. And transcribing medieval texts or identifying medieval images, is something that only the most hardcore amateurs are going to be able to help with, though the Your Archives project by the UK National Archive has one example.

Where does this leave crowd-sourcing for medievalists? There are a few possibilities I can see. One would be crowd-sourcing images of medieval buildings: there are already images on Flickr of extremely obscure medieval churches. Roger Pearse has also made the controversial suggestion that manuscript digitisation should be crowd-sourced. But all crowd-sourcing projects have costs, in terms of the time and money required to set up the project infrastructure, to monitor the input. motivate the crowd, and archive the results. I suspect that for most medieval topics, the vulgus is just too indoctum to make the effort worthwhile.


6 thoughts on “What can the vulgus do? Crowd-sourcing for medievalists

  1. I’ve confronted the same issue in my own crowdsourced transcription project. Since the documents I’m hosting are mainly of interest to the authors’ descendants, the pool from which I may draw is quite constrained. It’s frustrating to look at site logs and learn that visitors are finding the site via searches for the specific people or events mentioned in the documents, know that those visitors are my fellow researchers (in some cases long-lost cousins), but be unable to engage them.

    That said, even finding one passionate user (“nerd” in your terminology) can pay off — my most active user has transcribed more than a thousand pages so far, and has even managed to locate missing volumes of the diaries we’re transcribing. Perhaps the key to acquiring such jewels is to identify existing online communities of subject-matter enthusiasts and engage them in the project?


    • Starting with online communities is definitely a good move, but even then you rapidly narrow down the potential field of helpers with the extra requirements for transcription projects. You need people who have decent display screens and fast broadband, are very meticulous, and have a lot of free time (particularly since it takes a while to get your eye in with a script). I suspect for anything with a lot of transcribing you also need someone who’s a touch-typist: I’ve got a little autobiography by my grandfather that I’d like to transcribe sometime, but two-fingered typing makes it very hard to do. You can get lucky, as you have obviously done, but it’s hard to know whether you will get lucky before you start a project. And medieval script, language and abbreviations makes it ten times harder at least.


      • There are many kinds of hurdles to participation, but there’s some real potential for effective collaboration between two or more people, neither of whom individually have expertise in all the disciplines required.

        For medieval transcription projects, imagine an online partnership between a medievalist with your technical limitations regarding touch typing and a tech-savvy enthusiast — say a re-enactor who’d had a couple of years of Latin and had been exposed to paleography through Drogin’s Medieval Calligraphy. I’d think that such a combination would be capable of accomplishing quite a lot, with the amateur providing the bulk of the labor and the specialist acting as a guide and editor. That combination would also scale well if the volunteers outnumbered the experts, or if the work being transcribed was written in a consistent hand so that the volunteer could build on their own experience.

        Regarding the narrow field of participants, do you think that the community of people like my hypothetical re-enactor is much smaller than the pool of participants interested in Bentham or your grandfather’s auto-biography?


