Class Notes (6): Transformations

with No Comments

These are the class notes for the “Introduction to XML and editing ancient documents” seminar I am doing this summer semester at LMU, Munich

This week’s focus is on XSLT transformations. We will use XSLT to transform the Persuasion citation from last week. I will describe how the participants can use Oxygen to run transformations and how they can link their XML document to a particular XSLT stylesheet. We will focus on three different types of transformations. The first will be a transformation into a text document. The second is a transformation into an XML document and the third a transformation into HTML. We will look at transforming parts of the text while letting the rest of the text stay as it is and we will look at pulling out certain aspects of the text to make a list. Finally, we will have a look at how transformations can enable us to access information and display it in other programs (e.g. Excel or BibDesk) through formats such as CSV and BibTex.

So in class we began by looking at the citation from Persuasion again. It had been a couple of weeks since we last had a look at it so we had a quick look through the class notes from last week and I mentioned the example with the <supplied> tag again.

<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="http://www.stoa.org/epidoc/schema/8.16/tei-epidoc.rng" type="xml"?> 
<!--<?xml-stylesheet type="text/xsl" href="xsltforpersuasionhtml.xsl"?>-->
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>A citation from <title type="book">Persuasion</title> by <name type="author">Jane Austen</name></title>
  </titleStmt>
 <publicationStmt>
 <p>This document is as a part of the Introduction to XML seminar, 2013</p>
 </publicationStmt>
 <sourceDesc>
 <p>From Persuasion</p>
 </sourceDesc>
 </fileDesc>
</teiHeader>

<text>
 <body>
<ab> “I do not think I ever opened a book in my life which had not something to say upon <w type="gender" function="female">woman's</w> inconstancy. 
Songs and proverbs, all talk of <w type="gender" function="female">woman's</w> fickle<supplied reason="lost">ness</supplied>. 
But perhaps you will say, these were all written by <w type="gender" function="male">men</w>." 

"Perhaps I shall. Yes, yes, if you please, no reference to examples in books. 
<w type="gender" function="male">Men</w> have had every <w lemma="advant">advantage</w> of us in telling their own story. 
Education has been theirs in so much higher a degree; the pen has been in their <w type="nothing">hands</w>. 
I will not allow books to prove anything.” 

<name type="author">Jane Austen</name>, <title>Persuasion</title>
</ab>
</body>
</text>
</TEI>


The first XSLT example we looked at was one for displaying our XML document as plain text.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="utf-8" indent="yes" />
    
    <xsl:template match="//t:TEI">
        <xsl:apply-templates/>
    </xsl:template>
    
    <xsl:template match="//t:w">
        <xsl:text>"</xsl:text><xsl:value-of select="."/><xsl:text>"</xsl:text>
    </xsl:template>
    
    <xsl:template match="//t:supplied[@reason='lost']">
        <xsl:text>[</xsl:text><xsl:value-of select="."/><xsl:text>]</xsl:text>
    </xsl:template>
</xsl:stylesheet>

As you can see the output method is “text” and it basically outputs the entire text without XML tags. It has two templates. The first takes every instance of the <w> tag and outputs it with quotation marks surrounding the value in the tag. The second takes all <supplied> tags with the attribute reason=”lost” and outputs it with square brackets surrounding it. At this point we also experimented with outputting the value in the <supplied> tag enclosed in parenthesis as we saw last week.

 A citation from Persuasion by Jane Austen
 
This document is as a part of the Introduction to XML seminar, 2013 
 
 From Persuasion
 
 “I do not "think" I ever opened a book in my life which had not something to say upon "woman's" inconstancy. 
Songs and proverbs, all talk of "woman's" fickle[ness]. 
But perhaps you will say, these were all written by "men"." 

"Perhaps I shall. Yes, yes, if you please, no reference to examples in books. 
"Men" have had every "advantage" of us in telling their own story. 
Education has been theirs in so much higher a degree; the pen has been in their "hands". 
I will not allow books to prove anything.” 

Jane Austen, Persuasion

The second example outputs XML.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" encoding="utf-8" indent="yes" />
    
    <xsl:template match="//t:TEI">
        <mylist> <xsl:apply-templates select="//t:w[@type='gender']"/></mylist>
    </xsl:template>
    
    <xsl:template match="//t:w[@type='gender']">
        <genderreference><xsl:value-of select="."/></genderreference>
    </xsl:template>
    
</xsl:stylesheet>

Instead of printing the entire text this transformation only applies the specified templates. In this case there is a template that prints each instance of <w> tags where the attribute is type=”gender” between the <gendereference> tags with the tag <mylist> enclosing it all. (These tags are made up!)

<?xml version="1.0" encoding="utf-8"?>
<mylist xmlns:t="http://www.tei-c.org/ns/1.0">
   <genderreference>woman's</genderreference>
   <genderreference>woman's</genderreference>
   <genderreference>men</genderreference>
   <genderreference>Men</genderreference>
</mylist>

The third example outputs HTML.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html" encoding="utf-8" indent="yes" />
    
  <xsl:template match="//t:TEI">
      <html>
          <head></head>
          <body>  
                <ul>   
                    <xsl:apply-templates select="//t:w[@type='gender']"/>
                </ul>
          </body>
      </html>  
    </xsl:template>
    
    <xsl:template match="//t:w[@type='gender']">
    <li>    
        <xsl:choose>
            <xsl:when test="@function='female'">
           <i> <xsl:value-of select="."/></i>
            </xsl:when>
            <xsl:otherwise>
                <b><xsl:value-of select="."/></b>
            </xsl:otherwise>
        </xsl:choose>
    </li>
    </xsl:template>
    
</xsl:stylesheet>

For this example we add the obligatory HTML tags <html>, <head> and <body>. Again we display our list of gender references but this time in the HTML tags for unnumbered lists <ul> around <li> tags. When outputting HTML we can also play around with the style of the output. In this case I made a test and when it comes across a <w> tag where the function=”female” the value is enclosed in <i> (i.e. italics) tags. If the function is anything but female the value of the <w> tag is enclosed in <b> tags (i.e. bold). The resulting HTML looks like this :

<html xmlns:t="http://www.tei-c.org/ns/1.0">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   </head>
   <body>
      <ul>
         <li><i>woman's</i></li>
         <li><i>woman's</i></li>
         <li><b>men</b></li>
         <li><b>Men</b></li>
      </ul>
   </body>
</html>

and in a browser it is rendered like this:

html
Finally, I did a quick example where I simply output the xml value of the <w> tags as a comma-separated values (CSV) file. The CSV file can then be opened in your favourite spreadsheet program.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="utf-8" indent="yes" />
    
    <xsl:template match="//t:TEI">
        <xsl:apply-templates select="//t:w[@type='gender']"/>
    </xsl:template>
    
    <xsl:template match="//t:w[@type='gender']">
<xsl:value-of select="."/>,</xsl:template>
    
</xsl:stylesheet>

csv

The idea with the last example was to show how you can display XML in different formats with a little bit of XSLT.

Next time we will look more into HTML and how we can publish our data online with XML and HTML.

Links: