Class Notes (5): Schema and standards

with 1 Comment

These are the class notes for the “Introduction to XML and editing ancient documents” seminar I am doing this summer semester at LMU, Munich.

XML should be both wellformed and valid. Last week we used our XML editors to write wellformed XML. We encoded words like:

<word type="funny">fickleness</word>

This week I will use the Jane Austen’s Persuasion quote from last week produced to show how to encode the text so that it also validates against a schema, in this case a TEI schema. I will show them how to use the TEI Guidelines and explain how, with such a vast standard, we can still end up with different XML for the same text. Then I will introduce them to EpiDoc to show how, for ancient documents, we can use the same subset of TEI. 

First I marked up the persuasion citation using the TEI guidelines. I went back and forth between the guideline elements, the text and the TEI by Example validator.

I then talked about how vast guidelines such as TEI are and why (just like with the previous form of brackets and symbols) projects can end up using different mark-up for the same meaning. Here I use the <supplied> tag (see VTO2 291 –salutem). In some editions this could be marked with round brackets (abc) while in others square brackets [abc] might indicate the same thing. Here I used the examples below:

Vindolanda Tablet 291

Vindolanda Tablet 291

 BL2 on Gandhari.org

British Library Fragment 2 on gandhari.org

The TEI guidelines for the tag <supplied> suggest that the attribute @reason is used with “any phrase describing the difficulty”. One scholar may use the tag:

<supplied reason="faded-ink">abc</supplied>

and the next may use:

<supplied reason="ink-lost">abc</supplied>

Both scholars are using the supplied tags and both are valid TEI. However, it is difficult to compare these two, because even though they essentially mean the same thing they don’t use the exact same tag.

This is where a subset of TEI such as EpiDoc (developed particularly for ancient documents) comes into the picture. EpiDoc is built and used by scholars wanting to digitally publish ancient documents. In the above example EpiDoc suggest the use of a small set of five reasons – though “lost” is the most used one:

<supplied reason="lost">abc</supplied>

In other words, using EpiDoc as the schema for our encoding gives us better guidelines and support relevant to ancient documents. It will also make it easier for us to understand, share or cross-search datasets in the future.

Here are some link that you might want to check out before the next class – which will be in 2 weeks time on the 28th May.

One Response

  1. […] It had been a couple of weeks since we last had a look at it so we had a quick look through the class notes from last week and I mentioned the example with the <supplied> tag […]