By my own free uill I have zold and zell this to gou: on the full text of charters

This post is inspired by three things: a recent IHR paper given by Rosanna Sornicola, a paper given at the International Medieval Congress in 2011 by Peter Stokes of KCL and some of the comments on a previous post of mine about charters. It aims to ask a deceptively simple question: what do we mean by the full text of a charter?

To start with Rosanna’s paper, it was entitled “What the legal documents of the early middle ages can tell us about language: the case of 9th- and 10th-century charters from Southern Italy”, and was pretty much as it said in the title. She’s a professor of linguistics interested in the development of the Romance vernaculars out of Latin. It’s a question that’s been debated for more than a century, but the answers that are being suggested now are far more complicated than a simple change between two languages. Most of the models now are of the coexistence of Latin and the vernaculars, diglossia, with the locus of change not the language per se but the social groups who used a particular register of language. There was no unitary route between Latin and the vernaculars, but many different routes.

Rosanna was exploring one particular context for such change: southern Italy in the ninth and tenth century. Less attention has been paid to linguistic change there than in France or Spain, but it presents an interesting contrast. Unlike other areas, there isn’t the same cultural break as with the Merovingians in France or the Lombards in northern Italy, with the arrival of an essentially illiterate ruling class. Naples and Amalfi, in particular, had a rich and relatively autonomous cultural life. They were also much less influenced by Carolingian cultural reforms, which have sometimes been claimed to be key to developments elsewhere.

Instead, Rosanna was arguing for the persistence of late antique forms of Latin in the south, but this is a late antique Latin that is already substantially changed from ideas of “classical Latin”. The proliferation of the accusative in prepositional phrases, for example, such as “una cum alias terras meas”, is already visible in Pompeii graffiti and Ravenna papyri, as are plurals such as “campora” (fields).

Rosanna went on to discuss various other syntactic forms visible in the charter corpus: I think many of the examples may have been more striking to those whose Latin is better than mine to start with. But there was one particular quotation in her handout I want to give. It’s from a charter from Gaeta in 918 (CodCajet 1, XXIV, 43), where someone states:
“mea boluntatem bendidisse et bendidit bobis” (By my own free uill I have zold and zell this to gou).

My translation isn’t accurate, of course, but that’s the whole point. How do you translate something that’s lurking uneasily between Latin and something else like that? And what on earth can you do with free text spelt like that? For Rosanna’s purposes it’s ideal. For anyone who’s trying to track down all documents about sales, it’s a massive problem.

Which is where we backtrack six months to Peter Stokes talking about Anglo-Saxon Cluster and the problem of integrating different ideas of what a charter is. There’s already been a slightly bad–tempered post about this paper from Jon Jarrett, who I think for once got distracted from the key point. Which is that a lot of the difficulty of integrating four projects all talking about the same documents is that the charters can be conceptualised in very different ways.

What is a charter in terms of these projects’ focus?

1) In ESawyer it’s a document, with the main point being creating an index to help locate it and discussions of it.

2) In ASChart it’s a text (a string of words) with a date. (It’s worth noting here that this is specifically said to be a pilot project and to focus on marking up texts with XML, so it was not intended to be a replacement/equivalent of Sean Miller’s useful database).

3) In PASE a charter is a source, a set of factoids (X did Y). In fact it’s the old game of gutting sources for snippets of information.

4) Finally in Langscape a charter is a unique document (every manuscript is a different version, there’s no critical edition).

All this is reflected in very different attitudes to what form any “full text” included in the project takes. ESawyer includes for many records (but not all) the text of charters, taken from a several different editions. ASChart, as already mentioned, includes (non-searchable) full text with certain sections (such as dispositive words) marked up. PASE doesn’t include the full text of charters, but does, in theory, include all the main data points from them. Finally, Langscape includes three different versions of each text: semi-diplomatic, edited (i.e. broken up into lexical units for analysis) and glossed (provided with a headword and translation).

So when we talk about a database including the full text of a charter, we’re potentially thinking about very different things, with varying amounts of editorial intervention. First of all, there’s the question of whether you’re editing the material from scratch (which is very time-consuming), or relying on existing editions, which may not be consistent (especially with large corpuses). Secondly there’s the possibility of using XML mark-up to highlight particular sections. Finally there’s the possibility of full-text search.

What Rosanna’s paper strongly suggested to me is that full-text search is something of a red herring in most cases. Short of the kind of extreme editing that Langscape includes, I can’t see how you can often find things reliably in texts where the spelling is so erratic. This is going way beyond problems of Latin stemming (which have been researched for at least 25 years). Full text search is only really likely to work effectively where you’ve got fairly standardised Latin AND consistent editorial practices. Or possibly for individual words/phrases which are sufficiently distinctive and not spelled in too many alternative ways: you might be able to find most examples of “friskingas” (suckling pigs) in a database of charters, for example, if you sit down and check half a dozen similar words. But I don’t see that you’re going to get very far trying to pick out sales, for example. And I was recently staring at a transcription of a St Gall charter for some while in bemusement before I worked out that “drado” meant someone was going to hand over (trado) some property.

Similarly, ASChart is, to my mind, an interesting exercise in showing that XML mark-up of a charter in terms of its diplomatic doesn’t really get you a whole heap further in its study (which may be the reason it didn’t get beyond the pilot project stage). It’s possible to use it to pull out a list of invocations, for example, but you get something that isn’t easily scalable to large collections, because so many invocations are marginally distinctive. There’s not a substantial difference, for example, between starting a charter “In nomine Domini nostri Iesu Christi mundi saluatoris” and “In nomine Domini nostri Iesu Christi saluatoris mundi”, but I can’t see how you can easily find an algorithm that would automatically conflate phrases that are “similar” in this way.

What, in theory, might be more helpful is using XML mark-up combined with full-text search, so that you search only in the dispositive words, say, for “vendo” or variants thereof. But I’m not yet convinced that with the kind of variability you have in early medieval charters, you would really end up saving enough of the users’ time to justify all the work of tagging this data in the first place. I’d be interested to hear from people who work more on diplomatic on this point – what do you think XML might do for you?

I said in discussing the Making of Charlemagne’s Europe project I’m now working on that we’re not providing the full text of the charters. It’s more accurate to say that we won’t be systematically providing the full text of them – we’ll link to the full text online, where it’s freely available, and provide references to printed sources otherwise (much as PASE does). The hope is that this gives users most of what they need, without the additional expense of either licensing full text from previous editions (it’s interesting to note that some publishers are now republishing nineteenth century cartularies) or having to spend large amounts of time scanning/OCRing material. But it’s fair to say that I’m starting to realise how much more there is to the “full text” of a charter than at first meets the eye.

Advertisements

6 thoughts on “By my own free uill I have zold and zell this to gou: on the full text of charters

  1. If only to stop spellings like boluntatem seeming weird, I’d have thought full-text is still a good idea. Too much Cicero, and too many classicizing Carolingians (aka MGH editors), have given a very skewed impression of “Latin” for some time now. As you say the Pompeii graffiti show that “classical” Latin was never the language of the ordinary person in the street. As an example many moons ago I collected over 50 different spellings of “vixit” (e.g. vexit, bissit, bixsit, etc, etc) on early medieval epitaphs.

    I guess ultimately it comes down to the question of why curtail the possible utilities of your database before even starting? Have we all got so lazy that running six different searches for variant spellings is seen as a bad thing?

    Like

  2. This is, in some ways, a weird digital resonation of the whole diplomatic debate over what the original document is, which is at least nicely circular. My post, meanwhile, certainly was bad-tempered, but on the other hand the point of that part of it was not just that ASChart duplicated functions of ASCharters, but that ASChart was being advertised as a functional resource at a time when it simply wasn’t, pilot or otherwise, five years after its supposed completion.

    It’s worth noting here that this is specifically said to be a pilot project and to focus on marking up texts with XML.

    That is, of course, not a new idea; there are already two competing standards for doing it, in fact, the Charters Encoding Initiative (mentioned in a later post) and the Text Encoding Initiative (likewise) that everyone seems to be using instead to maximise interoperability.

    I can’t see how you can often find things reliably in texts where the spelling is so erratic

    Regular expressions! (Or, to the Microsoft-familiar, wildcards.) William Whitaker’s Latin dictionary program, Words, will, if you feed it a medieval Latin term, try riffing on the variant spellings until it comes up with something plausible. (Sometimes it’s gibberish, of course, but it does try.) That’s server-end, but the user of such texts in digital form rapidly gets used to accounting for the most common variations, like the betacism you use as an example—which Isidore of Seville claims is an African habit, it’s very very common—so, a search like ‘[b|v]end[o|id]’ ought to catch most of the transactional verbs and not too much else, for example. (Please note, that’s not any real regular expression syntax as far as I know, but it makes clearer than that would what kind of thing I mean.) It would be nice if this could some day be programmed in server-side, and Words shows it’s technically feasible, but I can imagine that in a large-scale corpus it becomes prohibitive very quickly with the current available hardware. Of course that will change… but not soon.

    I’d be interested to hear from people who work more on diplomatic on this point – what do you think XML might do for you?

    I tend to loathe XML philosophically; it is the equivalent of using a Dremel for everything because one doesn’t have the correct tool, but this is the direction the culture is heading so my attitude just marks me a dinosaur. In more practical terms, at user end, I don’t really see the point. With just the text of an edition, I can carry out ‘fuzzy’ searches that ought to let me do most of the same sort of things with the added confidence of having covered all the variants I can think of and express regularly. With a decent database, I can do the same thing. One might need a complex relational structure to make sure that when I search for ‘Gerosolima’ I also get ‘Gerosolyma’, for example, but since I won’t trust any person doing data entry at speed, especially if from OCR, with a large corpus, to get all those right anyway, I personally don’t mind if there isn’t one; I’ll be checking the variants either way. Where XML seems to pay off is at the back-end, because so many other technologies plug into it or can work with that kind of data now. I think server and database people like XML because it makes their lives easier, not because of user benefits. But we’ve already talked about this, and it’s not what you asked, so I’ll leave it there.

    Like

  3. XML is just a useful representation methodology, it can ease sharing or transform of information, nothing more.

    About ‘searches’. Yes, regular expressions are a must, but requires some trainning on the user’s side. Another usefull technology is the use of word-distance metrics (number of letters mismatched/displaced between two words) on the server side, so if you search for ‘thing’ you could also get results from ‘think’. In his default implementations all letters use to be weighted the same way (0=equal 1=not equal), but with the help of medieval latin philology it can tweaked to consider that ie: the distance between ‘b’ and ‘v’ has to be 0.1 (they are almost the same), this king of searches could cath things like ‘Wigo’, ‘Guigo’ ‘Uuigo’ being the same word.

    So I vocate for the integration of philological expertise within software frameworks for textual data management.

    Like

  4. Seminar ketchup: CXVII-CXXIIf I mean to get this blog back up to some reasonable frequency of posting and currency, I have obviously got to do something about the massive backlog of seminars I want or intended to report on, so it’s time for drastic measures. For a start, I…

    Like

  5. Seminar CLI: Spain and Africa’s earliest RomanceLet me make clear straight away, this post is about the Romance languages, not the literary genre. In fact, it is specifically about the birth of Romance in Spain, and with work on that of course comes indelibly associated the name of Professor Roger W…

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s