Over at A Corner of Tenth-Century Europe, a discussion that started off being about the trajectory of women’s history has mutated into one about why someone isn’t creating a system of unique identifiers for medieval texts. And while I’ve spent the last decade or so thinking about gender history, I’ve spent half my life thinking about databases and identifying references uniquely, because that is one of the things librarians do all day. So I wanted to start from Joan Vilaseca’s plea for “A public and standarized corpus of classical/ancient texts with external references to editions, versions, comments, articles, etc,etc.etc”, sketch out what I’m aware of as existing and explore why history seemingly can’t get its act together in the way that chemistry or taxonomy has.
There are actually some databases that do a fair amount of what Joan would want. As examples, there are:
1) Perseus Digital Library. This is a big and sophisticated free collection of classical texts, including some very neat tools, such as Greek and Latin word study tools (which I freely admit to using when I’m stumped on working out the root verb from a conjugated form). This doesn’t have identifying references, however.
2) Library of Latin texts. This commercial database includes the full text of the whole corpus of Latin literature up to the second century AD (essentially taken from the Teubner editions), plus a lot of patristic and medieval Latin (largely, but not entirely taken from the Corpus Christianorum series). Associated with this is the Clavis Patrum Latinorum which provides a numbered list of all Christian Latin texts from Tertullian to Bede. (There are similar indexes which cover Greek patristic texts, apocrypha, and early medieval French authors.
3) Thesaurus Linguae Graecae. This database includes most literary texts in Greek from Homer to the fall of the Byzantine Empire. It’s a subscription service, but it includes a free online canon database that provides unique identifying numbers for works and parts of works.
4) Bibliotheca Hagiographica Latina (BHL). This is a catalogue of ancient and medieval Latin hagiographical materials, produced by the Bollandists, which provides unique identifying numbers for different texts. There’s also a free online version.
5) Leuven Database of Ancient Books. This free database includes basic information on all literary texts preserved in manuscript from the fourth century BC to AD 800; the texts are assigned a unique number. (It’s a subset of the Trimegistos project which focus on documents from Graeco-Roman Egypt, both literary and non-literary and also provides identifying numbers).
What this very brief overview reflects is one basic fact: to produce a database and/or identifying ring systems of any size takes time and money. As a result, there have to be enough people wanting the result to make it worthwhile making that investment. There are several different models for financing such projects: you can sell the resultant database (either for profit or at a break-even price), or you can persuade funding bodies to support you, or rely on charitable donations but you need someone willing to pay.
It’s worth looking here to see why identifier projects in other fields have succeeded. A lot of large-scale identifier projects, for example, have come out of library science and publishing, both because these are huge and connected networks and because there’s the commercial driver of being able to identify something in your inventory quickly and accurately). So the Standard Book Number, developed for WH Smith in the 1960s became the ISBN of today, followed in the 1970s by the ISSN for serials, etc. It’s noticeable that it took more than twenty years after unique identifiers for serials to develop for unique identifiers for individual articles within those serials to develop (the CrossRef project using DOIs). This wasn’t because no user ever wanted an individual article to read before then; it was because it was only with electronic journals that it became feasible to try and sell individual articles to people.
Most of the other really large-scale nomenclature/identifier projects have been in the sciences, for the simple reason that the same phenomena are being studied all over the world. We’re (mostly) looking at the same sky, hence the International Astronomical Union was formed in 1919. The International Union of Pure and Applied Chemistry, responsible for chemical nomenclature also dates from a similar period. (One of the other main systems of chemical nomenclature, the CAS Registry number is an offshoot of the subscription index/database Chemical Abstracts). Again, people are trying to do the same chemical reactions from Bombay to Los Angeles, so there’s a big demand for such systems. Biological classification has a very long history, dating back to Linnaeus (although unique identifiers are only just being developed), reacting to thousands of years of attempts to show how all species are related.
The classical/medieval database projects that I’ve mentioned above have essentially been possible because they have a sufficiently tightly-defined group of potential users who are all interested in the same sort of thing: classical literature or papyrology or hagiography. It’s therefore worth creating something for them to use. The problem with extending such a system to broader historical areas is that no-one cares about history.
That sounds ridiculous, but it’s a problem I’ve mentioned before: it’s not really clear that we’re doing the same thing as historians when we study vastly different periods and use completely different sorts of sources. Or to put it a different way, the Old Bailey database is a remarkable resource, but not of any professional use to me. I don’t care about all history, everywhere; I care specifically about early medieval European history. Historical sources, even just medieval sources, aren’t one thing, but a patchwork of different islands and most researchers spend most of their time perched securely on a few of these, rarely venturing off them. I’ve had years of being an early medievalist and never needed to cite Sawyer numbers, for example, because I don’t research or teach Anglo-Saxon history; I’d be almost equally baffled if I came across Corpus Iuris Canonici footnotes without the help of Edward Peters. The patchwork systems of identifying medieval documents remain because of the lack of overlap between the groups of researchers using them, and I can’t see any driving force that is going to change that. Crowd-sourcing has produced some remarkable things, but creating unique identifiers is a peculiarly ill-suited task for crowd-sourcing. Unless more people start caring about the history of everywhere at all times, Joan isn’t going to get the wide-ranging system he’d like.