What I’m doing: Hincmar and charters

As you will have noticed, things have been very quiet on this blog for ages, because I’ve been busy on a lot of other things. The busyness will be continuing for several more months, but some of the results of it are now becoming available.

1) The Making of Charlemagne’s database is now online and our blog will continue to be updated. I will also be giving a seminar on Tuesday 3rd February at Leeds on the project, as part of the series Medieval Studies in the Digital Age. (This is a free event, but you need to register).

2) The book of essays on Archbishop Hincmar of Rheims I have been co-editing with Charles West is now at the proof stage. According to Manchester University Press, Hincmar of Rheims: Life and Work will be appearing in July 2015. This contains the research of an international cast of scholars (British, French, German, Dutch, US and Canadian) and will answer almost all your Hincmar-related research needs. The rest will be answered by our forthcoming translation of De Divortio, also from MUP, which is making steady progress: this will replace the translation of this text we have currently made available on the Collaborative Hincmar project blog.

3) My new project (until July) is working on Charles’ Turbulent Priests project. I will be focusing on priests and their representation in original charters from the eighth and ninth centuries.

I hope to get back to more regular blogging in a couple of months, but until then, I hope some of this interests you.

Fifty years of historical database angst

The Making of Charlemagne’s Europe project website has now gone live, and includes a post by me on interconnecting charter databases. I mention in that a recent argument when we were trying to decide which of several different categories of transaction a particular document fell into. Just to show that such problems of coding documents are not new, here are some quotes from a recent article on Charles Tilly, a historical sociologist and a pioneer of using databases for historical research.

The Codebook for Intensive Sample of Disturbances guides more than 60 researchers in the minutiae of a herculean coding project of violent civil conflicts in French historical documents and periodicals between 1830–1860 and 1930–1960…The Codebook contains information about violent civic conflict events and charts the action and interaction sequences of various actors (called there formations) over time….we find fine-grained detail and frequent provision made for textual commentary on the thousands of computer punch cards involved.

(John Krinsky and Ann Mische, “Formations and Formalisms: Charles Tilly and the Paradox of the Actor”, Annual Review of Sociology, 39 (2013), p. 3)

The article then goes on to quote the Codebook on the issue of subformations (when political groups split up):

In the FORMATION SEQUENCE codes,treat the subformation as a formation for the period of its collective activity—but place 01 (“formation does not exist as such at this time”) in the intervals before and after. If two or more subformations comprise the entire membership of the formation from which they emerge, place 01 in that formation’s code for the intervals during which they are acting. But if a small fragment breaks off from a larger formation,continue to record the activities of the main formation as well as the new subformation.

If a formation breaks up, reforms and then breaks up in a different way, assign new subformation numbers the second time.

If fragments of different formations merge into new formations, hop around the room on one foot, shouting ILLEGITIMIS NON CARBORUNDUM.

(Krinsky and Mische, p 4, citing Charles Tilly, Codebook for intensive sample of disturbances. Res.DataCollect. ICPSR 0051, Inter-Univ. Consort. Polit. Soc. Res., Ann Arbor, Mich. (1966), p. 95)

In nearly fifty years, we’ve gone from punch-cards to open source web application frameworks, but we still haven’t solved the problem of historical data (and the people behind it) not fitting neatly into the framework we create, however flexible we try and be.

Leeds 2013 report 3: charters and non-charters

My time at the International Medieval Congress at Leeds this year was a slightly strange one, alternating between thinking about the work I do in the day job (as research associate on the Making of Charlemagne’s Europe charters project) and going to papers about all the other things in which I’m still interested (gender and religion and culture and shiny stuff). And also a lot of meeting friends and making new ones. I forgot to mention in my first report on the IMC that I finished Monday evening with the bloggers’ meet-up, in which, according to Leeds tradition, some old-hand Leeds bloggers turned up (myself, Jonathan Jarrett and L’historien errant), others didn’t (Gesta, Another Damned mMdievalist and Kathleen Neal) and we met some new bloggers: Karen Schousboe of Medieval Histories magazine and the Victorian Librarian (going medieval with a dash of pre-Raphaelite). I had to be reasonably restrained during the evening, however, since I was speaking on Tuesday.

On Tuesday, I started with one of my regular forays into what I am prone to call “Not My Millennium”, i.e. anything happening after the Year 1000. Technically, part of the session was my millennium, since Session 506 on “Law, Violence, and Social Bonds, I: Power, Conflict and Dispute Settlement had one Carolingian paper. The speakers were:

Matthew McHaffie, Lordship and Authority in Anjou, c. 1000 – c. 1150

Kim Esmark, Power and Pressure: The Micropolitics of 11th-Century Aristocratic Networks

Warren C. Brown, Conflict and the Laity in Carolingian Europe

The first two papers were “things to do with charters” ones, but taking very different approaches. Matthew’s was a trawl through nearly 3000 charters from Anjou to find around 120 that dealt with warranty, and he was then focusing on what those could tell us about legal practice. Paul Hyams has argued that warranty provides protection against outside challenge to a donation and also compensation if this protection failed. Looking at the charters, Matthew found very variable diplomatic (probably relating to the oral context in which such warranties were originally given) and evidence that suggested that it wasn’t just legal protection that could be provided. It could be handier just to have someone show up at the court with an intimidating posse. But warrantors weren’t always useful for churches (we don’t have grants to laymen before 1150): some were ineffective, and they might even backfire. For example, Hubert de Tabal gave land to Marmoutier, which was then taken by St Urban. Since Hubert was unable to warrant his gift, he ended up seizing the land back himself.

From a more general point of view, Matthew’s paper was interesting in suggesting something about the frequency of events required to make a large-scale charter trawl worthwhile. In a PhD (lasting presumably 3 years), he’s found 120 warranty clauses out of nearly 3000 charters, a hit-rate of around 4%. He also said he’s found 3 out of 120 in which women act as warrantors and around 10% of the warranty clauses are for exchanges. When you’re getting down to that level of rarity of an event/type (less than 1% of your sources), it’s really not feasible to trawl just for them; it has to be done as an offshoot of other research. One of the questions in using charters is how we can more effectively find such rare but not unique events.

In contrast to this wide-range focus on a particular type of charter, Kim Esmark’s paper was using charters to look narrowly but deeply, carrying out a prosopographical study of Odo of Blaison, a lord in Anjou. He appears in around 70 charters, both as part of the entourage of the counts of Anjou and with his own entourage, settling disputes or consenting to their alienation of property. Kim was mapping Odo’s social networks and arguing that a lay lord like Odo couldn’t easily dominate an area even during the notoriously weak reign of Fulk IV of Anjou. Odo had to provide for his own dependants and this was sometimes tricky: Kim quoted a placitum from the mid 1080s settling a long dispute with the church of St Lezin in Angers. In this, Odo had to give up some revenues from land held by his own men to the canons of St Lezin; Rotaldus, one of his vavassors , refused to consent to the charter and was excommunicated. It took a year before he agreed to the charter. Kim thought we needed to pay more attention to charter witnesses and to look at constraints to lordly power from below as well as above.

Kim’s paper was also interesting for my own work because in theory, charter projects such as Charlemagne’s Europe should provide the possibility to locate and analyse multiple Odos quickly: important men below comital status who turn up in a number of different sources and whose dependents/connections we also want to trace. In particular, I think we need to make our database structures and schematics as openly available as possible, so that they can be reused by people working on charters for different periods. But how that could be done technically I don’t yet know.

After two papers on things to do with high medieval charters, we then had Warren Brown discuss things to do with early medieval formulae. Having found Warren’s work very useful in the past, I found this a slightly disappointing paper: it was mainly a tour of the formularies, pointing out some of the interesting topics they dealt with (and his paper made surprisingly little mention of the work of Alice Rio, who’s done ground-breaking work on these). But overall it was an enjoyable session, if in an over-crowded room.

The rest of the day was mainly giving and preparing for my own sessions. I did, however, get to Session 702 on early medieval queenship. I’ve already discussed the paper by Val Garver on textile working by queens. The other two papers were by Grzegorz Pac and Hailey La Voy. Grzergorz was talking about the C10 and C11 iconography of queenship, focusing on images of the Virgin Mary being crowned or crowning others.

His main point was that although the idea of Mary as a queen became a doctrine in the fifth-century and images of her being crowned or crowning others were common from the Ottonian period onwards, we need to be careful considering the gendered implications of this: as the images he used showed, Mary could also be used in scenes as an indication of male ecclesiastical authority (e.g. her role in Bernward of Hildesheim’s Gospels) or depicted as crowning a king:

Virgin crowning OttoVirgin Crowning Otto III (or I?), folio 160v, Cod. LXXXVI, Biblioteca Capitolare, Ivrea, c. 966-1002.

(For more details on this sacramentary, see Evan Gatti’s article in Peregrinations vol 3 (2010), from which this image was taken).

Hailey, meanwhile, was focusing on letters from popes to queens and empresses, and in particular several letters from Popes Nicholas I and John VIII to queens at the Carolingian and Byzantine courts (many of which are available in translation via Epistolae. In particular, she was suggesting the importance of the model of Esther, as a royal wife being encouraged to give good advice to her husband, and warned of the evil consequences if she did not. In contrast, the Virgin Mary isn’t mentioned as an intercessor in such letters; Hailey was arguing that the image of her as a queen intervening with her Son developed as a result of earthly models of queenship, rather than the other way round.

I was involved in two sessions on Tuesday afternoon and evening: the first was session 808, organised by Johannes Preiser-Kapeller,who also gave us a typically erudite and high-speed trip through the possibilities of combining Bruno Latour’s Actor-Network Theory, object biography, spatial analysis via the new mapping tools available on the web and social network analysis. Johannes’ presentation, “Medieval entanglements: trans-border networks in Byzantium and China in comparison (300-900 CE)” is already available on the internet. My rather more low-key and downbeat paper, “Caught in Charlemagne’s web”, will also be available online shortly: its main point is that scaling up social network analysis of charters is going to be complicated, and will need a lot of careful thought about how we generate the networks.

If I was being somewhat sceptical about the possibility of using the “Making of Charlemagne’s Europe” database for social network analysis in this session, I was a lot more enthusiastic about its other possibilities in the final session of the day (910), when we were showing off our database alongside the Nomen et Gens database. This was definitely a session for early medieval charter nerds with a good sense of direction, since we were in one of the harder-to-find seminar rooms, but we got a surprisingly large audience and a positive reaction to our demonstration. Most of the presentation was pre-prepared Power Points (which again, will be up on the new project website soon), but we even managed a brief live link, quite impressive since the prototype user interface was still being built when the conference started. All in all, it was a good end to the first couple of days of the conference.

Medieval social networks 2: charters and connections

As a follow-up to my first post on social network analysis, I’m now gradually reading some of the many books and articles on historians’ use of network analysis that readers of my blog suggested. And having read a couple of chapters of Giovanni Ruffini, Social Networks in Byzantine Egypt, I’m coming to realise that one of the most difficult issues for those of us working with documentary sources is deciding what counts as a connection between two people and what links should therefore be included in the network.

The majority of the late antique/medieval network analysis studies that I’ve looked at work by hand-crafting links. Someone sits down, works their way through their sources and picks out by eye every link between two people (or two places). Often, they also categorise the link. For example, Elizabeth Clark, when studying conflicts between Jerome and Rufinus, divided links into seven different types: “marriage/kinship; religious mentorship; hospitality; travelling companionship; financial patronage, money, and gifts; literature written to, for, or against members of the network; and carriers of literature and information correspondence.”

(Elizabeth A. Clark, “Elite networks and heresy accusations: towards a social description of the Origenist controversy”, Semeia (56) 1991, 79-117 at p. 95).

Similarly, Judith Bennett did the same thing when looking at connections of families recorded in the Brigstock manorial court records:

The content of these transactions has been divided into six qualitative categories that collectively encompass all possible transactions. These categories are based upon whether the network subject interacted with an-other person by whether the network subject interacted with an-other person by (i) receiving assistance, (2) giving assistance, (3) acting jointly, (4) receiving land, (5) giving land, or (6) engaging in a dispute.

(Judith M. Bennett, “The tie that binds: peasant marriages and families in late medieval England”, Journal of Interdisciplinary History 15 (1984), 111-129, at p. 115).

And for networks of places, Johannes Preiser-Kapeller, “Networks of border zones: multiplex relations of power, religion and economy in South-Eastern Europe, 1250-1453 AD”, in Revive the past: proceeding of the 39th conference on computer applications and quantitative methods in archaeology, Beijing, 12-16 April 2011 edited by Mingquan Zhou, Iza Romanowska, Zhongke Wu, Pengfei Xu and Philip Verhagen,. (Amsterdam, Pallas Publications, 2012), 381-393, combined existing geographical datasets on late antique land and sea routes with details of church and state administrative networks he’s compiled from documentary sources.

Such approaches create very reliable networks, but they’re hard to scale up. Clark looks at 26 people; Judith Bennett has 31 people and 1,965 appearances in extant records from 1287-1348. Preiser-Kapeller has around 270 nodes and 680 links in total. Rosé’s study of Odo of Cluny, which I discussed in the previous post, had 860 links. For charters, such hand-crafted networks would probably only allow the exploration of small archives or individual villages.

What is more, researchers often want to carry out social network analysis as an offshoot of more general prosopographical work, such as creating a charter database. But it’s hard to analyse links until you’ve first created a prosopography, because it’s only when you’ve been through all the charters that you have a decent idea of whether two people of the same name are actually the same person. (There’s a further issue here about whether you may end up with circular reasoning between prosopography and network analysis, but I’ll leave that for now). So in theory, you’d need to go through all the charters first to identify people and then have to go back to assess whether or not they are linked in a meaningful way, doubling your work.

As a result, some researchers have started trying to see if there are ways of automatically creating networks from existing databases or files, developing methods for analysing charters that (in theory) can be scaled up relatively easily. In the rest of the post I want to look at the relatively few projects I’m aware of attempting to do this and outline how we might approach the problem with the Making of Charlemagne’s Europe dataset.

The three projects I’m looking at are by Giovanni Ruffini, working on the village of Aphrodito in Egypt (see reference above), Joan Vilaseca, who’s been experimenting on creating graphs from the early medieval sources he’s collected at Cathalaunia.org and a controversial article by Romain Boulet, Bertrand Jouve, Fabrice Rossi, and Nathalie Villa, “Batch kernel SOM and related Laplacian methods for social network analysis”, Neurocomputing 71 (2008), 1257-1273.

Ruffini is explicit about how he’s creating his networks and the problems that may result from this (pp. 29-31). He’s taking documents and creating “affiliation networks”: all those who appear in the same document are regarded as connected to one another. As he points out, the immediate problem is that this method can introduce distortions if you have one or two documents with very large numbers of names. For example, one of the texts in his corpus is part of the Aphrodito fiscal register and has 455 names in it, while the average text names only eleven (p. 203). If such a disproportionately large text is included, analysis of connectivity is badly distorted, with all the people appearing in the fiscal register appearing at the top of connectivity lists.

The same effect can be seen in Joan Vilaseca’s graphs. If you look at his first attempts at graphing documents from Catalonia between 898-914, they’re dominated by the famous judgement of Valfogona in 913.

But Joan’s graphs also show an additional problem. His first graphs also give great prominence to Charles the Simple and Louis the Stammerer, because they appear so often in dating clauses. When he starts looking for measures of centrality in his next post he initially finds the most connected people to be St Peter, the Virgin Mary and Judas Iscariot (who appear frequently in sanction clauses).

This brings us to the key question: what does it mean to be in the same charter as another person? The problem is that people are named in charters for so many different reasons: they may be saints, donors, witnesses, relatives to be commemorated, scribes or even the count whose pagus you are in. People may also appear as the objects of transactions: some of our early decisions on the Charlemagne project were deciding how we would treat the unfree (and possibly the free) who were being transferred between one party and another. Such unfree have an obvious connection to the donor and the recipient. But do they have any meaningful relationship to the witnesses or the scribe? At least with witnesses, there’s a reasonable chance in most cases that they all physically met at some point, but I don’t know of any evidence that the unfree would necessarily have been present when their ownership was transferred by a charter.

So simple affiliation networks, even when you eliminate disproportionately large documents and people mentioned only in dating or sanction clauses, can still be inaccurate representations of actual relationships. One possible response to this problem is to include as links only types of relationships that are themselves spelled out in the charters. Joan has some graphs showing only family and neighbourhood relationships, for example. Ruffini (p. 21) suggests the possibility of using data-sets where a link is defined as existing only when there is a clear connection between two parties in a document e.g. between a lessor and a lessee. But as he points out, we would then have much smaller data-sets. And for early medieval charters, in particular, focusing on the main parties to a transaction only would simply demonstrate that most transaction were about people donating or selling land to churches and monasteries, which is not exactly new information.

Are there any other ways to cut out “irrelevant” connections while keeping those we think are likely to show meaning? Another approach that Joan tries uses affiliation networks, but then removes links where two people occur together in only one document. For his interest in identifying key members of Catalan society, focusing on the most important links may well make sense. But they potentially distort the evidence on one question of wider interest: how significant are weak ties in charter-derived networks? Weak ties, where two people interact only occasionally, may paradoxically be more important for spreading information or practices. Given we have only a small subset of interactions preserved via charter data, significant weak ties may be lost if we start removing data from affiliation networks in this way.

Implicitly, at least, an alternative method for selecting links within what’s broadly an affiliation network is given by Boulet, Jouvet, Rossi and Villa. As they explain in their study of thirteenth and fourteenth century notarial acts, they constructed a graph in the following manner (pp. 1264-1265):

First, nobles and notaries are removed from the analyzed graph because they are named in almost every contracts: they are obvious central individuals in the social relationships and could mask other important tendencies in the organization of the peasant society. Then, two persons are linked together if:

_ they appear in a same contract,
_ they appear in two different contracts which differ from less than 15 years and on which they are related to the same lord or to the same notary.

The three main lords of the area (Calstelnau Ratier II, III and Aymeric de Gourdon) are not taken into account for this last rule because almost all the peasants are related to one of these lords. The links are weighted by the number of contracts satisfying one of the specified conditions.

Though it’s not clear why people are regarded as linked if they use the same notary, the other criteria seem to be ways of trying to filter out distortions that potentially arise from notorial practices. If men are routinely described in terms of their affiliation to a lord e.g. “A the man of B”, then an affiliation network will derive from a sale between “A the man of B” and “C the man of D” not only the justified links A to B, C to D and A to C, but also links that in practice are unlikely to exist or at least are not proven to do so, i.e. A to D, C to B and B to D.

So how might we balance distortions from applying the affiliation network model to charter data against loss of data or an unfeasibly high workload if we don’t use this method? The model for the Making of Charlemagne’s Europe database allows inputting of relationship factoids, which will catch explicit references to people as the relatives or neighbours of others. Graphs using such data will be relatively easy to construct.

We are also, however, recording “agent roles”, used to identify what role a person or an institution plays within an individual charter or transaction (e.g. witness, scribe, object of transaction, granter). At the minimum, any social network analysis application added to the system should probably allow a user to choose which of these roles they want included within the graphs to be created. There should also be some threshold (either chosen by us or user-defined) for excluding documents that contain “too many” different agents. We’re still not going to get the precision graphs that hand-crafting links will give, but we can hopefully still get something that will tell us something useful about how people interact.

Digital diplomatics 1: projects and possibilities

I am currently trying to get up to speed on some of the many projects involving charters online, drawing heavily on accounts from the Digital Diplomatics conferences (and also Jon Jarrett’s useful reports on the 2011 conference). I don’t claim to be an expert on charters, but I have been using (and sometimes developing) databases for 25 years, so some of the issues seem quite familiar from my experience as a librarian. What I want to do in this first post is give a sample of the types of project out there and also note what I consider to be some particularly interesting features.

It’s useful to start with a sketch of the origins of diplomatics (the study of charters) because that explains a lot about how digital developments have been shaped. The starting point was the attempts by early modernists to work out which charters of a particular religious institution were false and which were genuine. For this, the key ability was being able to compare charters with good evidence for being authentic (e.g. held as originals) to other more dubious versions. As a result, charter studies have often been organised either around particular collections/archives (e.g. editions of cartularies, charters of St Gall) or around rulers (e.g. the diplomas of Charles the Bald), because it’s easier to spot the dodgy stuff in a reasonably homogenous corpus.

Charters have also long been a key source for regional history, so eighteenth and nineteenth century scholars produced a lot of editions of regional collections of documents including charters, such as the Histoire générale de Languedoc. Where the corpus is small enough, these have then been extended to national collections or overviews, some of which I mention below.

From the purely print age, we have now, however, begun moving into digital diplomatics and there have been a variety of approaches.

1) Simple retro-digitisation
Because there’s been scholarly interest in diplomatics for several centuries, a lot of early editions are now out of copyright. Simple retro-digitisation of old editions doesn’t often get mentioned in discussions of digital diplomatics (though Georg Vogeler, “Digitale Urkundenbücher. Eine Bestandsaufnahme”, Archiv für Diplomatik, 56 (2010), p. 363?392 has a useful discussion of them), but there are a lot of old charter editions being put online by projects such as Google, Internet Archive, Gallica etc. This data, however, is pretty hard for charter scholars to make use of unless they’re looking for a specific charter (or at most a specific edition). Is there any way in which this material could be deal with more effectively?

Doing something with such data doesn’t strike me as a project that’s likely to possible to fund (it’s not new and exciting enough). The most plausible way of organising it seems to me to be crowd-sourcing of OCR work on charter scans (or checking already OCR’d documents) along with adding some basic XML markup and then sticking them in a repository. Monasterium seems the obvious one to use. Whether there would be enough researchers interested in charters from more than one foundation to make the effort of doing this worthwhile, however, I’m not sure.

2) Databases based on the printed edition model
Printed editions of charters are normally either arranged chronologically or include a chronological index. (There are a few cartulary editions which don’t have this, and I have winced at having to look through hundreds of pages to spot if there are any Carolingian charters). The vast majority of printed editions also have indexes to personal names and place names. In contrast, content analysis of the charter is often fairly limited, in the form of headnotes plus a narrative introduction.

The indexes to printed charters, if they’re done properly, work pretty well for the needs of many people working with these sources. Or, to see it from a different angle, historians studying charters arrange their research into these kind of categories. As a result, where such indexes don’t exist in the original edition, you’ll often find that someone creates them later (like Julius Schmincke doing an index to Dronke’s edition of the charters from Fulda).

A lot of charter databases are still essentially arranged around these traditional print access methods, with digitisation essentially adding (often fairly basic) full text search and remote access. Many of the online charter projects that have got furthest have been digitisations of relatively small and coherent existing charter collections, which have already been published in a single print series. There are several based on national collections, such as Sean Miller’s database of Anglo-Saxon charters, Diplomaticum Norvegicum and Diplomatarium Fennicum. There are also some regional charter databases of the same type (such as the Württembergische Urkundenbuch, and the early twentieth-century edition of the Cluny charters have also been put in a database. And then, of course, there’s the charters section of the digital Monumenta Germaniae Historica.

3) Aggregator databases
There are also a few charter database projects which are based on aggregating multiple printed editions: the two most important are Monasterium and Chartae Burgundiae Medii Aevi.

4) Born digital/hybrid editions
In contrast to the substantial projects of digitising existing editions, most of the born digital (or moved to digital) charter databases seem to be fairly small scale. The one exception I’ve found so far is Codice diplomatico della Lombardia Medievale which has now put over 5,000 Lombard charters from the eighth to twelfth century online.

5) Databases of originals
There is also a slightly separate strand of digital diplomatics research, which has focused on charters which are preserved in the originals (rather than as cartulary copies, etc). Some of these databases just include the text, others focus on images of charters. Projects include ARTEM and the (basic) database now attached to the Chartae Latinae Antiquiores publishing project. I’m also aware of several more image-focused projects, such as the Marburg Lichtbildarchiv, and Pergamo Online, which contains images of parchments preserved in Pergamo.

I’m not going to discuss the image databases in any detail, because they’re a very different kettle of fish to the textual databases I’m used to working with, but it is worth noting how decisions made on how much detail is recorded for original documents can be fairly arbitrary. As George Vogeler points out, there’s an odd division for the St Gall charters between the early stuff that gets put in horrendously expensive printed ChLA editions and the material from the eleventh century onwards that is available free via Monasterium.

6) Linguistic projects
I also won’t say much about charter database projects that focus on linguistic analysis of texts, such as Corpus der altdeutschen Originalurkunden bis zum Jahr 1300, Langscape and the work being done by people like Rosanna Sornicola and Timo Korkiangas. While this is interesting work, it seems to me of less immediate relevance to most historians.

7) Factoid model
As Patrick Sahle put it in a recent paper (“Vorüberlegungen zur Portalbildung in der Urkundenforschung”, Digitale Diplomatik: Neue Technologien in der historischen Arbeit mit Urkunden. Archiv fur Diplomatik Schriftgeschichte, Siegel-und Wappenkunde, Beiheft 12, edited by Georg Vogeler (Cologne, Böhlau Verlag, 2009), 325-341 at p. 338), the object of diplomatic research is the individual charter. Most database projects are structured in a way that reflects this focus on the charter as a unit.

A contrast is given by the factoid model adopted by a number of KCL projects, such as the Prosopography of Anglo-Saxon England and what will shortly become the People of Medieval Scotland project. Here, the key unit is the factoid, a statement of the form: “Source S claims Agents X1, X2, X3 etc carried out Action A1 connected with Possessions/Places P1, P2 at date D1.” A charter (or another source) can thus be broken down into a number of factoids, allowing finer grained-access to the content of charters. Although this may not seem an obvious approach to considering charters (and there are a number of practical problems), it does match surprisingly well to the “Who, What, Where, When, How do we know” model that I’ve mentioned before as one approach to working with charters.

What works
As my overview suggests, there are already too many charter databases out there to make it easy to discuss them all in any more depth than “here’s another one that does X, Y and Z”. But there are some projects that seem to me to illuminate particularly important aspects of digital diplomatics:

1) DEEDS: full text done right
I’ve discussed before the problems of searching full-text databases of charters, but most projects don’t seem to respond to such problems. Instead they have very basic full-text facilities, and certainly nothing like the ability to use regular expressions that Jon Jarrett longs for.

The problem with regular expressions, of course, is that they still require an expert user. And as several generations of designers of library catalogues and other kinds of databases know, most users aren’t experts, and they don’t want to have to become so to be able to use your database. Even if you learn the right syntax, how do you know what spelling variations to try searching for before you’ve seen what might be lurking in the database? For example, if know that the MGH edition of one of Charlemagne’s charters (DK 169) refers to a particular county as Drungaoe or Trungaoe, how on earth would it occur to you that the same charter in Monasterium would name the place as “Traungaev”?

DEEDS is the only project I’ve seen so far that has really sophisticated analytical tools for full-text. Its methods of shingles for example, is currently being applied to dating documents, but it strikes me as something that might also very usefully be applied to identifying particular formularies used by someone drawing up a charter. By breaking a document down in this way, you can analyse multiple factors suggesting that a document is “nearer” to one model than another in a way that’s simply not practical with manual methods.

Even more useful, potentially is DEEDS’ use of normalisation. Their alternative spelling option makes their search engine cope with a lot of the more common issues in searching Latin. But the really interesting part to me was their discussion of using normalisation to produce phonetic proxies. This takes a phrase such as “Sciant presentes et futuri quod ego Iohannes de Halliwelle” and reduces it to “scnt prsnt cj futr cj eg iohns pr hall”, the bare sounds of the key terms. A full-text search facility with phonetic proxy as option strikes me as one of the few ways that you might be able to produce something that could find you the multiple possible Latin spellings of the Traungau, without you needing to sit down for a week to work them out…

2) ARTEM: bringing in the users
ARTEM, the database of French original charters before 1121 is far from being the biggest or the more sophisticated charter database around. Where the project has succeeded, however, is in getting researchers actually to use the database. There have been several conference publications based on its work, e.g. Marie-José Gasse-Grandjean and Benoît-Michel Tock, eds. Les actes comme expression du pouvoir au haut Moyen âge: actes de la table ronde de Nancy, 26-27 novembre 1999. Atelier de recherches sur les textes médiévaux, 5. (Turnhout, Brepols, 2003).

What I’m not yet sure of is why ARTEM have been more successful than comparable projects in getting other scholars involved. Is it because they’ve been going longer, that they’re more pro-active in arranging roundtables, or is it because France has a weird early medieval charter distribution, with a large number of relatively small collections of charters, and thus researchers desperately need a multi-archive database?

3) Monasterium: Charters 2.0
Monasterium.net describes itself as a “collaborative archive” and it’s the only project I’m so far aware of that takes the idea of user participation seriously. As well as providing tools for working with and annotating individual charters (which I haven’t yet had the chance to try out), it’s also intended to provide a distributed infrastructure into which individual archives from across Europe can add their material. As a means for getting later medieval charters available online, especially for smaller archives, it looks ideal. In terms of data quantity and quality, however, it’s liable to the patchiness inherent to large-scale collaborative projects: some areas get very well-covered, some don’t get referred to at all.

4) CBMA: blending old and new
Chartae Burgundiae Medii Aevi isn’t unusual in its scope ? it’s aiming to put online the 15,000 charters from the region of Burgundy. What’s more unusual is its methods ? it’s putting online both old editions and previously unedited cartularies. There are obvious issues here about whether they can get data consistency, but potentially it seems more practical to start with existing editions (however imperfect) and “grow” a database using them, than to wait for funding to re-edit everything from scratch.

5) Cathalaunia.org: DIY databases
All the databases I’ve discussed so far have been major research projects. However Cathalaunia.org, created by Joan Vilaseca shows the possibility for a dedicated individual to produce their own web-based charter database, using easily available tools.
Joan uses a wiki format, which for the relatively small number of documents he has provides a neat way of showing links between people and places. The unstructured nature of the data may make it harder to search, but it also means that different genres of documents (not just charters, but hagiography etc) can be incorporated easily. It’s a useful reminder that charter information doesn’t have to be stored in relational databases. (For another example of this minimalist approach, see Project FAST, which is putting a Florentine archive online).

Cathalonia.org also raises an interesting point about audiences and the accessibility of charter databases. The site is in Catalan, which makes it far more suitable for what I presume is Joan’s main audience, people interested in the history of their own region. But for those of us who aren’t Catalans (and don’t specialise in its history) the use of a relatively uncommon language is a disadvantage.

Preliminary conclusions
The databases I’ve so far read about or seen prove that there are lots of interesting projects going on, but I do slightly wonder if there’s too much variety. Different audiences and different aims can explain some of the variants, but I think maybe we start needing to adapt more systematically from previous projects. I can see the components of really effective databases in some projects, but so far they’re not being pulled together into something that properly builds on the pioneering work. So, I finish with a question for the more experienced users: what do you like from particular charter database sites? What should the Charlemagne project be stealing from other projects?

By my own free uill I have zold and zell this to gou: on the full text of charters

This post is inspired by three things: a recent IHR paper given by Rosanna Sornicola, a paper given at the International Medieval Congress in 2011 by Peter Stokes of KCL and some of the comments on a previous post of mine about charters. It aims to ask a deceptively simple question: what do we mean by the full text of a charter?

To start with Rosanna’s paper, it was entitled “What the legal documents of the early middle ages can tell us about language: the case of 9th- and 10th-century charters from Southern Italy”, and was pretty much as it said in the title. She’s a professor of linguistics interested in the development of the Romance vernaculars out of Latin. It’s a question that’s been debated for more than a century, but the answers that are being suggested now are far more complicated than a simple change between two languages. Most of the models now are of the coexistence of Latin and the vernaculars, diglossia, with the locus of change not the language per se but the social groups who used a particular register of language. There was no unitary route between Latin and the vernaculars, but many different routes.

Rosanna was exploring one particular context for such change: southern Italy in the ninth and tenth century. Less attention has been paid to linguistic change there than in France or Spain, but it presents an interesting contrast. Unlike other areas, there isn’t the same cultural break as with the Merovingians in France or the Lombards in northern Italy, with the arrival of an essentially illiterate ruling class. Naples and Amalfi, in particular, had a rich and relatively autonomous cultural life. They were also much less influenced by Carolingian cultural reforms, which have sometimes been claimed to be key to developments elsewhere.

Instead, Rosanna was arguing for the persistence of late antique forms of Latin in the south, but this is a late antique Latin that is already substantially changed from ideas of “classical Latin”. The proliferation of the accusative in prepositional phrases, for example, such as “una cum alias terras meas”, is already visible in Pompeii graffiti and Ravenna papyri, as are plurals such as “campora” (fields).

Rosanna went on to discuss various other syntactic forms visible in the charter corpus: I think many of the examples may have been more striking to those whose Latin is better than mine to start with. But there was one particular quotation in her handout I want to give. It’s from a charter from Gaeta in 918 (CodCajet 1, XXIV, 43), where someone states:
“mea boluntatem bendidisse et bendidit bobis” (By my own free uill I have zold and zell this to gou).

My translation isn’t accurate, of course, but that’s the whole point. How do you translate something that’s lurking uneasily between Latin and something else like that? And what on earth can you do with free text spelt like that? For Rosanna’s purposes it’s ideal. For anyone who’s trying to track down all documents about sales, it’s a massive problem.

Which is where we backtrack six months to Peter Stokes talking about Anglo-Saxon Cluster and the problem of integrating different ideas of what a charter is. There’s already been a slightly bad–tempered post about this paper from Jon Jarrett, who I think for once got distracted from the key point. Which is that a lot of the difficulty of integrating four projects all talking about the same documents is that the charters can be conceptualised in very different ways.

What is a charter in terms of these projects’ focus?

1) In ESawyer it’s a document, with the main point being creating an index to help locate it and discussions of it.

2) In ASChart it’s a text (a string of words) with a date. (It’s worth noting here that this is specifically said to be a pilot project and to focus on marking up texts with XML, so it was not intended to be a replacement/equivalent of Sean Miller’s useful database).

3) In PASE a charter is a source, a set of factoids (X did Y). In fact it’s the old game of gutting sources for snippets of information.

4) Finally in Langscape a charter is a unique document (every manuscript is a different version, there’s no critical edition).

All this is reflected in very different attitudes to what form any “full text” included in the project takes. ESawyer includes for many records (but not all) the text of charters, taken from a several different editions. ASChart, as already mentioned, includes (non-searchable) full text with certain sections (such as dispositive words) marked up. PASE doesn’t include the full text of charters, but does, in theory, include all the main data points from them. Finally, Langscape includes three different versions of each text: semi-diplomatic, edited (i.e. broken up into lexical units for analysis) and glossed (provided with a headword and translation).

So when we talk about a database including the full text of a charter, we’re potentially thinking about very different things, with varying amounts of editorial intervention. First of all, there’s the question of whether you’re editing the material from scratch (which is very time-consuming), or relying on existing editions, which may not be consistent (especially with large corpuses). Secondly there’s the possibility of using XML mark-up to highlight particular sections. Finally there’s the possibility of full-text search.

What Rosanna’s paper strongly suggested to me is that full-text search is something of a red herring in most cases. Short of the kind of extreme editing that Langscape includes, I can’t see how you can often find things reliably in texts where the spelling is so erratic. This is going way beyond problems of Latin stemming (which have been researched for at least 25 years). Full text search is only really likely to work effectively where you’ve got fairly standardised Latin AND consistent editorial practices. Or possibly for individual words/phrases which are sufficiently distinctive and not spelled in too many alternative ways: you might be able to find most examples of “friskingas” (suckling pigs) in a database of charters, for example, if you sit down and check half a dozen similar words. But I don’t see that you’re going to get very far trying to pick out sales, for example. And I was recently staring at a transcription of a St Gall charter for some while in bemusement before I worked out that “drado” meant someone was going to hand over (trado) some property.

Similarly, ASChart is, to my mind, an interesting exercise in showing that XML mark-up of a charter in terms of its diplomatic doesn’t really get you a whole heap further in its study (which may be the reason it didn’t get beyond the pilot project stage). It’s possible to use it to pull out a list of invocations, for example, but you get something that isn’t easily scalable to large collections, because so many invocations are marginally distinctive. There’s not a substantial difference, for example, between starting a charter “In nomine Domini nostri Iesu Christi mundi saluatoris” and “In nomine Domini nostri Iesu Christi saluatoris mundi”, but I can’t see how you can easily find an algorithm that would automatically conflate phrases that are “similar” in this way.

What, in theory, might be more helpful is using XML mark-up combined with full-text search, so that you search only in the dispositive words, say, for “vendo” or variants thereof. But I’m not yet convinced that with the kind of variability you have in early medieval charters, you would really end up saving enough of the users’ time to justify all the work of tagging this data in the first place. I’d be interested to hear from people who work more on diplomatic on this point – what do you think XML might do for you?

I said in discussing the Making of Charlemagne’s Europe project I’m now working on that we’re not providing the full text of the charters. It’s more accurate to say that we won’t be systematically providing the full text of them – we’ll link to the full text online, where it’s freely available, and provide references to printed sources otherwise (much as PASE does). The hope is that this gives users most of what they need, without the additional expense of either licensing full text from previous editions (it’s interesting to note that some publishers are now republishing nineteenth century cartularies) or having to spend large amounts of time scanning/OCRing material. But it’s fair to say that I’m starting to realise how much more there is to the “full text” of a charter than at first meets the eye.

Making charters useful

I finished at the Fitzwilliam Museum at the end of December and started a new job last week: as Postdoctoral Research Associate on the new King’s College London project The Making of Charlemagne’s Europe: 768-814. Officially the project is intended to create a database of the surviving documentary evidence from Charlemagne’s reign. Unofficially, I see it as a project to make charters useful.

There are a lot of people, of course, who already find early medieval charters very useful. If you’re doing regional studies (of e.g. Catalonia or Alsace or Brittany), charters are essential evidence. But if you’re doing a study that isn’t regionally focused in this way, then frankly charters are less ideal, because there are just too damn many of them. There are around 4,500 documents for Charlemagne’s reign alone. How do you find the ones that actually provide relevant information for your purposes?

This is why, potentially, our database will come in very handy, especially since it’s being designed by people who have considerable experience of previous similar database projects, such as Prospopography of Anglo-Saxon England (PASE) and Paradox of Medieval Scotland (POMS). The prosopographical side is thus very well-covered. However, the plan is to have more: both mapping facilities and statistical analysis. We’re not providing the full text of charters, but we will be providing structured data of various kinds. So one of the questions we need to ask right at the start is what information do researchers actually want to get out of the corpus of charters that they can’t get currently? Asking this question among the readers of this blog seems as good a place as any to start. I know you’re not all Carolingianists, but a lot of you will have worked with charters or bulk data of some kind. What research questions interest you for which such a database might be a help?

What follows is my first very rough list of possible research areas. All comments welcome; if you know of work that’s already been done, or if I’ve missed out something, please add it in. I’m still at the brainstorming stage at this point, and this post reflects this.

1) Studies on literacy
Graham Barrett is another researcher on the project, so this angle may be fairly well-covered anyhow. He’s already done studies with later Spanish charters, looking, for example, at affiliations of scribes and the number of documents that particular scribes wrote. This immediately ties into research questions about the professionalism of scribes, and the extent of lay literacy.

I also wonder whether we should make a special note of charters that include references to books as property, so we can get a picture of where they are mentioned.

2) Family/women
Most of the detailed studies of families will obviously be done on a regional basis. But the prosopographical side of the database will enable us to create biographies of individuals/families who have a transregional activity. What I’m not yet sure is what kind of data it would be useful to produce on such people. Given the strong spatial emphasis in the database, would it be useful to be able to map the activities of not just an individual, but a group of them?

One of the things we are definitely going to do is give the sex of every individual mentioned, which immediately makes possible a lot of the analysis about women’s land-holding etc. (It takes under a minute to dig out the 48 female witnesses from the POMS database, for example).

I think we need to have some kind of record of relatives being prayed for, though I’m not yet sure in how much detail. But this ties in usefully with discussions about which relatives “counted” in which situations.

It’d be nice to use charters for getting demographic data about families, as well, but that may be unrealistic. Has anyone seen this sort of thing done successfully?

3) Ethnicity
Despite all the problems with questions of ethnicity, it’s still interesting to see how the charters reflect this. We will probably be drawing on the work of Nomen et gens as far as ethnicity of personal names is concerned; what might also be useful to note is if specific ethnic terminology is used in charters to refer to people.

4) Legal practice
This is an area I know less about, so if anyone knows who’s doing interesting work on this, it’d be a help to know. My immediate thoughts for things it would be useful to record are number of witnesses to a document (so you could, say, pull out documents with less than the six witnesses Alemannic law said you were supposed to have) and references to law/laws within the charter (whether specific or general).

5) Monasticism
One useful piece of information would be to know how the collective membership of particular religious communities are described – are they ‘monachi’ or “deo sacrata” or what? It’d be particularly interesting to learn more about references to canons/canonesses.

It’ll be possible to break down charters by date and region, so we can potentially get comparative data on the well-known idea of “waves of pious giving” – how long do people keep on making large donations to churches/monasteries after they’ve been founded?

I don’t know if early Carolingian charters have enough boundary clauses to make this work, but Barbara Rosenwein’s classic study of Cluny was collecting data on the extent to which a donated piece of land was adjacent to Cluny’s property already, which allowed seeing monastic land-acquiring strategies and how literally “being the neighbour of St Peter” was meant.

Looking at statistics for proportions of donations versus precaria for different monasteries/regions also contributes to the whole debate about pragmatic versus spiritual rewards for donors (which I always associate with Rosenwein on Cluny versus John Nightingale on Gorze). I also wonder whether there is any way of flagging up people who make donations to more than one foundation, given these may form particularly interesting test cases for studying how patronage decisions were made.

6) Military history
One of the questions we’re trying to work out at the moment is how much detail we go into about renders. Possibly we will just have a general term for animal renders, given the trade-off between precision in recording and time taken. But I do wonder if we should treat references to renders in horses separately, given their military importance. Any thoughts?

7) Price information
This is again an issue of how much detail we can put in without the project over-running, but how useful would it be to note if there are references to values in coinage? Wendy Davies did some promising studies on this for Spain.

8) Political history
One of the most useful possibilities that the mapping side of the project potentially allows us to explore is the nature of the Carolingian county. The arguments about “flat counties” versus “scattered counties” have been going on for decades: if we input the data right, we can explore in detail the geographical relationships that the sources themselves choose to mention.

It will also be useful to be able to map and contrast royal interventions between regions; while the data from royal charters is probably limited enough that this could be done manually, this project will potentially also allow us a transregional view of royal missi and vassi.

9) Social structure
Chris Wickham, in particular, has used charters from a number of regions for the comparative study of social structures, but of necessity, such work has normally drawn on syntheses of studies of a few locations. Potentially, this database allows wider comparisons, though both potential approaches to categorising social levels have their problems. The first possibility is using explicit references to office and social status within the charters: although there are problems in comparing these across the regions, they are potentially soluble. Perhaps even more intriguing is whether a social classification could be developed based on activity-derived status. In other words, could we find a way to mark all those who made more than a dozen donations, or witnessed over a geographical range of more than 10 miles, etc? This might show to what extent influential people exist who don’t obviously hold office or get called “nobilis” etc.

10) Rural and landscape history
Again, this is an area where bulk comparative data is potentially useful, but we have to work out how much detail we can go into, especially for landscape features in charters. Should these be regarded as purely conventional and excluded or are some of them worth listing specifically? I’m inclined to think it’s worth mentioning mills, but not huts, for example.

Those, for now, are my ideas of what we might do with our data, given the limitation I’ve already mentioned, that we’re not going to have the full text of charters. Any obvious suggestions that I’ve overlooked will be gratefully received.

Trifle layers, puzzle boxes, and charter statistics

I’m around six months behind in blogging IHR seminars, and Jon Jarrett has already provided not only the text of his paper from June on “Managing Power in the Post-Carolingian Era: Rulers and Ruled in Frontier Catalonia”, but also pictures of the event. So instead, I want to use his paper and another much more recent one I heard at the IHR on Spanish charters to act as a springboard for thinking about how we might use charters to compare societies.

The second paper was Graham Barrett on “The Literate Mentality and the Textual Society in Early Medieval Spain”, and for me some of the most interesting parts of his talk were the statistical evidence. He was working from a corpus of around 4000 charters of 711-1031 from the Asturias-Leon and Navarra in northern Spain. And one of the key points he was making was that while the charter numbers went up from 850, there wasn’t an increase in the average number of royal charters per year, but there was one in ecclesiastical charters and the number of lay charters increased more or less steadily. In other words, this is a society where top-down notions of increasing literacy don’t work particularly well – the charter habit isn’t simply percolating down as a side-effect of governmental bureaucracy.

Similarly, Graham had statistics about scribes, showing how over the period there were an increasing number of scribes who were writing more charters, rather than most scribes only writing one or two charters, e.g. that we’ve got something that looks like the tentative start of royal and aristocratic chanceries. And he also thought it was possible to see different categories of scribes, in terms of who they wrote for: royal scribes, episcopal scribes, monastic scribes, aristocratic scribes and village scribes.

The point of both these two lots of statistics is that in theory they’re region-independent. You could take statistics from a completely different area of Spain (or Germany or France or England) and compare them and see if the same patterns are visible. So it might be possible to see whether patterns of top-down literacy do seem more plausible elsewhere, or whether the “professionalisation” of scribes varies in time across different areas. You can start to do comparative history with charters in a way that you can’t easily with just anecdotal or case study evidence.

Well, that’s the charter statistics, but where do the trifles and the puzzle boxes come in? These are two metaphors that have been used for looking at the structure of medieval societies. The first is from Susan Reynolds, Fiefs and Vassals (OUP, 1994), p. 40:

the layers of [medieval] society were more like those of a trifle than a cake: its layers were blurred, and the sherry of accepted values soaked through. Taking the whole of society…one has to see it as a very rich and deep trifle with a lot of layers

The other is from Jon’s talk at the IHR, where he described power in Catalonia as akin to a puzzlebox, in which only some of the holes lead to the ground. In some areas counts are visible directly interacting with the lower levels of society, in others they have to go through intermediaries. Looking at when/where that happens is one of the key issues in ideas about the tenth/eleventh century feudal mutation and the “privatization of power”. But the other way round – when/where do local networks start to connect into wider ones, it’s probably one of the main factors in the rise of the Carolingian empire. That’s Matthew Innes’ idea anyhow – that what the Carolingians succeed in doing is getting local societies connecting into court networks, without necessarily changing the families who are actually running these local areas.

So what would be very useful is if we can somehow start coming up with metrics or criteria for how important people that, again, we can use for cross-regional comparisons. The problem is, a lot of titles are very regionally specific, such as the Visigothic saio or the Breton machtiern. And while there are other criteria we can use, a lot of them aren’t really helpful in practice with the evidence we have. For example, Chris Wickham wants to call people peasants only when they’re personally doing some agricultural work (Framing the Early Middle Ages p 386), which is next to impossible to prove either way in the majority of cases. In contrast, Chris’ ideas about the scale of a person’s control over land do sound like one of the most promising ways to start to distinguish some of the layers of the trifle.

Such a statistical approach isn’t the only way we can approach charters; people have got a lot of interesting stuff out of considering individual charters or small clusters, but it might be worth going back now to some of the pioneering statistical studies such as the work by David Herlihy and Barbara Rosenwein and seeing what else we can do now with vastly more computer power and web 2.0 technologies.