Digital diplomatics 1: projects and possibilities

I am currently trying to get up to speed on some of the many projects involving charters online, drawing heavily on accounts from the Digital Diplomatics conferences (and also Jon Jarrett’s useful reports on the 2011 conference). I don’t claim to be an expert on charters, but I have been using (and sometimes developing) databases for 25 years, so some of the issues seem quite familiar from my experience as a librarian. What I want to do in this first post is give a sample of the types of project out there and also note what I consider to be some particularly interesting features.

It’s useful to start with a sketch of the origins of diplomatics (the study of charters) because that explains a lot about how digital developments have been shaped. The starting point was the attempts by early modernists to work out which charters of a particular religious institution were false and which were genuine. For this, the key ability was being able to compare charters with good evidence for being authentic (e.g. held as originals) to other more dubious versions. As a result, charter studies have often been organised either around particular collections/archives (e.g. editions of cartularies, charters of St Gall) or around rulers (e.g. the diplomas of Charles the Bald), because it’s easier to spot the dodgy stuff in a reasonably homogenous corpus.

Charters have also long been a key source for regional history, so eighteenth and nineteenth century scholars produced a lot of editions of regional collections of documents including charters, such as the Histoire générale de Languedoc. Where the corpus is small enough, these have then been extended to national collections or overviews, some of which I mention below.

From the purely print age, we have now, however, begun moving into digital diplomatics and there have been a variety of approaches.

1) Simple retro-digitisation
Because there’s been scholarly interest in diplomatics for several centuries, a lot of early editions are now out of copyright. Simple retro-digitisation of old editions doesn’t often get mentioned in discussions of digital diplomatics (though Georg Vogeler, “Digitale Urkundenbücher. Eine Bestandsaufnahme”, Archiv für Diplomatik, 56 (2010), p. 363?392 has a useful discussion of them), but there are a lot of old charter editions being put online by projects such as Google, Internet Archive, Gallica etc. This data, however, is pretty hard for charter scholars to make use of unless they’re looking for a specific charter (or at most a specific edition). Is there any way in which this material could be deal with more effectively?

Doing something with such data doesn’t strike me as a project that’s likely to possible to fund (it’s not new and exciting enough). The most plausible way of organising it seems to me to be crowd-sourcing of OCR work on charter scans (or checking already OCR’d documents) along with adding some basic XML markup and then sticking them in a repository. Monasterium seems the obvious one to use. Whether there would be enough researchers interested in charters from more than one foundation to make the effort of doing this worthwhile, however, I’m not sure.

2) Databases based on the printed edition model
Printed editions of charters are normally either arranged chronologically or include a chronological index. (There are a few cartulary editions which don’t have this, and I have winced at having to look through hundreds of pages to spot if there are any Carolingian charters). The vast majority of printed editions also have indexes to personal names and place names. In contrast, content analysis of the charter is often fairly limited, in the form of headnotes plus a narrative introduction.

The indexes to printed charters, if they’re done properly, work pretty well for the needs of many people working with these sources. Or, to see it from a different angle, historians studying charters arrange their research into these kind of categories. As a result, where such indexes don’t exist in the original edition, you’ll often find that someone creates them later (like Julius Schmincke doing an index to Dronke’s edition of the charters from Fulda).

A lot of charter databases are still essentially arranged around these traditional print access methods, with digitisation essentially adding (often fairly basic) full text search and remote access. Many of the online charter projects that have got furthest have been digitisations of relatively small and coherent existing charter collections, which have already been published in a single print series. There are several based on national collections, such as Sean Miller’s database of Anglo-Saxon charters, Diplomaticum Norvegicum and Diplomatarium Fennicum. There are also some regional charter databases of the same type (such as the Württembergische Urkundenbuch, and the early twentieth-century edition of the Cluny charters have also been put in a database. And then, of course, there’s the charters section of the digital Monumenta Germaniae Historica.

3) Aggregator databases
There are also a few charter database projects which are based on aggregating multiple printed editions: the two most important are Monasterium and Chartae Burgundiae Medii Aevi.

4) Born digital/hybrid editions
In contrast to the substantial projects of digitising existing editions, most of the born digital (or moved to digital) charter databases seem to be fairly small scale. The one exception I’ve found so far is Codice diplomatico della Lombardia Medievale which has now put over 5,000 Lombard charters from the eighth to twelfth century online.

5) Databases of originals
There is also a slightly separate strand of digital diplomatics research, which has focused on charters which are preserved in the originals (rather than as cartulary copies, etc). Some of these databases just include the text, others focus on images of charters. Projects include ARTEM and the (basic) database now attached to the Chartae Latinae Antiquiores publishing project. I’m also aware of several more image-focused projects, such as the Marburg Lichtbildarchiv, and Pergamo Online, which contains images of parchments preserved in Pergamo.

I’m not going to discuss the image databases in any detail, because they’re a very different kettle of fish to the textual databases I’m used to working with, but it is worth noting how decisions made on how much detail is recorded for original documents can be fairly arbitrary. As George Vogeler points out, there’s an odd division for the St Gall charters between the early stuff that gets put in horrendously expensive printed ChLA editions and the material from the eleventh century onwards that is available free via Monasterium.

6) Linguistic projects
I also won’t say much about charter database projects that focus on linguistic analysis of texts, such as Corpus der altdeutschen Originalurkunden bis zum Jahr 1300, Langscape and the work being done by people like Rosanna Sornicola and Timo Korkiangas. While this is interesting work, it seems to me of less immediate relevance to most historians.

7) Factoid model
As Patrick Sahle put it in a recent paper (“Vorüberlegungen zur Portalbildung in der Urkundenforschung”, Digitale Diplomatik: Neue Technologien in der historischen Arbeit mit Urkunden. Archiv fur Diplomatik Schriftgeschichte, Siegel-und Wappenkunde, Beiheft 12, edited by Georg Vogeler (Cologne, Böhlau Verlag, 2009), 325-341 at p. 338), the object of diplomatic research is the individual charter. Most database projects are structured in a way that reflects this focus on the charter as a unit.

A contrast is given by the factoid model adopted by a number of KCL projects, such as the Prosopography of Anglo-Saxon England and what will shortly become the People of Medieval Scotland project. Here, the key unit is the factoid, a statement of the form: “Source S claims Agents X1, X2, X3 etc carried out Action A1 connected with Possessions/Places P1, P2 at date D1.” A charter (or another source) can thus be broken down into a number of factoids, allowing finer grained-access to the content of charters. Although this may not seem an obvious approach to considering charters (and there are a number of practical problems), it does match surprisingly well to the “Who, What, Where, When, How do we know” model that I’ve mentioned before as one approach to working with charters.

What works
As my overview suggests, there are already too many charter databases out there to make it easy to discuss them all in any more depth than “here’s another one that does X, Y and Z”. But there are some projects that seem to me to illuminate particularly important aspects of digital diplomatics:

1) DEEDS: full text done right
I’ve discussed before the problems of searching full-text databases of charters, but most projects don’t seem to respond to such problems. Instead they have very basic full-text facilities, and certainly nothing like the ability to use regular expressions that Jon Jarrett longs for.

The problem with regular expressions, of course, is that they still require an expert user. And as several generations of designers of library catalogues and other kinds of databases know, most users aren’t experts, and they don’t want to have to become so to be able to use your database. Even if you learn the right syntax, how do you know what spelling variations to try searching for before you’ve seen what might be lurking in the database? For example, if know that the MGH edition of one of Charlemagne’s charters (DK 169) refers to a particular county as Drungaoe or Trungaoe, how on earth would it occur to you that the same charter in Monasterium would name the place as “Traungaev”?

DEEDS is the only project I’ve seen so far that has really sophisticated analytical tools for full-text. Its methods of shingles for example, is currently being applied to dating documents, but it strikes me as something that might also very usefully be applied to identifying particular formularies used by someone drawing up a charter. By breaking a document down in this way, you can analyse multiple factors suggesting that a document is “nearer” to one model than another in a way that’s simply not practical with manual methods.

Even more useful, potentially is DEEDS’ use of normalisation. Their alternative spelling option makes their search engine cope with a lot of the more common issues in searching Latin. But the really interesting part to me was their discussion of using normalisation to produce phonetic proxies. This takes a phrase such as “Sciant presentes et futuri quod ego Iohannes de Halliwelle” and reduces it to “scnt prsnt cj futr cj eg iohns pr hall”, the bare sounds of the key terms. A full-text search facility with phonetic proxy as option strikes me as one of the few ways that you might be able to produce something that could find you the multiple possible Latin spellings of the Traungau, without you needing to sit down for a week to work them out…

2) ARTEM: bringing in the users
ARTEM, the database of French original charters before 1121 is far from being the biggest or the more sophisticated charter database around. Where the project has succeeded, however, is in getting researchers actually to use the database. There have been several conference publications based on its work, e.g. Marie-José Gasse-Grandjean and Benoît-Michel Tock, eds. Les actes comme expression du pouvoir au haut Moyen âge: actes de la table ronde de Nancy, 26-27 novembre 1999. Atelier de recherches sur les textes médiévaux, 5. (Turnhout, Brepols, 2003).

What I’m not yet sure of is why ARTEM have been more successful than comparable projects in getting other scholars involved. Is it because they’ve been going longer, that they’re more pro-active in arranging roundtables, or is it because France has a weird early medieval charter distribution, with a large number of relatively small collections of charters, and thus researchers desperately need a multi-archive database?

3) Monasterium: Charters 2.0 describes itself as a “collaborative archive” and it’s the only project I’m so far aware of that takes the idea of user participation seriously. As well as providing tools for working with and annotating individual charters (which I haven’t yet had the chance to try out), it’s also intended to provide a distributed infrastructure into which individual archives from across Europe can add their material. As a means for getting later medieval charters available online, especially for smaller archives, it looks ideal. In terms of data quantity and quality, however, it’s liable to the patchiness inherent to large-scale collaborative projects: some areas get very well-covered, some don’t get referred to at all.

4) CBMA: blending old and new
Chartae Burgundiae Medii Aevi isn’t unusual in its scope ? it’s aiming to put online the 15,000 charters from the region of Burgundy. What’s more unusual is its methods ? it’s putting online both old editions and previously unedited cartularies. There are obvious issues here about whether they can get data consistency, but potentially it seems more practical to start with existing editions (however imperfect) and “grow” a database using them, than to wait for funding to re-edit everything from scratch.

5) DIY databases
All the databases I’ve discussed so far have been major research projects. However, created by Joan Vilaseca shows the possibility for a dedicated individual to produce their own web-based charter database, using easily available tools.
Joan uses a wiki format, which for the relatively small number of documents he has provides a neat way of showing links between people and places. The unstructured nature of the data may make it harder to search, but it also means that different genres of documents (not just charters, but hagiography etc) can be incorporated easily. It’s a useful reminder that charter information doesn’t have to be stored in relational databases. (For another example of this minimalist approach, see Project FAST, which is putting a Florentine archive online). also raises an interesting point about audiences and the accessibility of charter databases. The site is in Catalan, which makes it far more suitable for what I presume is Joan’s main audience, people interested in the history of their own region. But for those of us who aren’t Catalans (and don’t specialise in its history) the use of a relatively uncommon language is a disadvantage.

Preliminary conclusions
The databases I’ve so far read about or seen prove that there are lots of interesting projects going on, but I do slightly wonder if there’s too much variety. Different audiences and different aims can explain some of the variants, but I think maybe we start needing to adapt more systematically from previous projects. I can see the components of really effective databases in some projects, but so far they’re not being pulled together into something that properly builds on the pioneering work. So, I finish with a question for the more experienced users: what do you like from particular charter database sites? What should the Charlemagne project be stealing from other projects?


2 thoughts on “Digital diplomatics 1: projects and possibilities

  1. Thanks for such a useful review of online charters developments – ie: I was unaware of the existence of the DEEDS project -, and for mentioning my small scale initiative. Only two short comments:

    I’ve pondered to add an internationalized interface to the site, but it’s no small effort. Anyway, it would be a really enhacing usability feature if the site got some kind of institutional support/funding.

    About consulting the charter’s data in new ways, I would like to point to the associated blog: , especially the posts named ‘Visualitzant el cens…’ and the hyperlinked SVG image files on them that allow quite a different way to look at and navigate thru the data on the site.


    • Dear Joan,

      Glad to include you in the list – there are a lot of projects going on, so it’s interesting to see how differently charters can be put online. I want to write another post at some point about mapping medieval social networks, so your blog post will be useful for that.

      As for multilingual interfaces, I appreciate how much extra work it is to include them. Even most of the academic projects, if they do include some foreign-language information, only have a small amount, so it wasn’t intended as a critique of your work. It’s very hard balancing the different needs of multiple types of users.


