Showing:

Documentation
Used by
References
Overriding
Imported modules
Included modules
Imported from
Source
Stylesheet from.xsl
Documentation

Description

TEI stylesheet for converting Word docx files to TEI

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Author: See AUTHORS

Id: $Id: from.xsl 7206 2010-02-22 18:44:23Z rahtz $

Copyright: 2008, TEI Consortium

Imported modules
Included modules
Imported from
Template /
Documentation

Description

The main template that starts the conversion from docx to TEI

IMPORTING STYLESHEETS AND OVERRIDING MATCHED TEMPLATES:

When importing a stylesheet (xsl:import) all the templates in the imported stylesheet get a lower import-precedence than the ones in the importing stylesheet. If the importing stylesheet now wants to override, let's say a general template to match all <w:p> elements where no more specialized rule applies it can't since it will automatically override all w:p[someprediceat] template in the imported stylesheet as well. In this case we have outsourced the processing of the general template into a named template and all the imported stylesheet does is to call the named template. Now, the importing stylesheet can simply override the named template, and everything works out fine.

See templates: - w:p (mode: paragraph)

Modes:

  • part0: a normalization process for styles. Can also detect illegal styles.
  • part2: templates that are relevant in the second stage of the conversion are defined in mode "part2"
  • inSectionGroup: Defines a template that is working o a group of consecutive elements (w:p or w:tbl elenents) that form a section (a normal section not to be confused with w:sectPr).
  • paragraph: Defines that the template works on an individual element (usually starting with a w:p element).
  • iden: simply copies the content
Namespace No namespace
Match /
Mode #default
References
Import precedence 17
Source
<xsl:template match="/">
  <!-- Do an initial normalization and store everything in $part0 -->
  <xsl:variable name="part0">
    <xsl:apply-templates mode="part0"/>
  </xsl:variable>
  <!-- Do the main transformation and store everything in the variable part1 -->
  <xsl:variable name="part1">
    <xsl:for-each select="$part0">
      <xsl:apply-templates/>
    </xsl:for-each>
  </xsl:variable>
  <!-- Do the final parse and create valid TEI -->
  <xsl:apply-templates select="$part1" mode="part2"/>
  <xsl:call-template name="fromDocxFinalHook"/>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template fromDocxFinalHook
Namespace No namespace
Overriding
Import precedence 17
Source
<xsl:template name="fromDocxFinalHook"/>
Stylesheet location ../../../docx/from/from.xsl
Template w:document
Documentation

Description

Main document template
Namespace No namespace
Match w:document
Mode #default
References
Import precedence 17
Source
<xsl:template match="w:document">
  <TEI>
    <!-- create teiHeader -->
    <xsl:call-template name="create-tei-header"/>
    <!-- convert main and back matter -->
    <xsl:apply-templates select="w:body"/>
  </TEI>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template w:body
Documentation

Description

Create the basic text; worry later about dividing it up
Namespace No namespace
Match w:body
Mode #default
References
Import precedence 17
Source
<xsl:template match="w:body">
  <text>
    <!-- Create forme work -->
    <xsl:call-template name="extract-forme-work"/>
    <!-- create TEI body -->
    <body>
      <!-- 
					group all paragraphs that form a first level section.
				-->
      <xsl:for-each-group select="w:sdt|w:p|w:tbl" group-starting-with="w:p[teidocx:is-firstlevel-heading(.)]">
        <xsl:choose>
          <!-- We are dealing with a first level section, we now have
						to further divide the section into subsections that we can then
						finally work on -->
          <xsl:when test="teidocx:is-heading(.)">
            <xsl:call-template name="group-by-section"/>
          </xsl:when>
          <!-- We have found some loose paragraphs. These are most probably
						front matter paragraps. We can simply convert them without further
						trying to split them up into sub sections. -->
          <xsl:otherwise>
            <xsl:apply-templates select="." mode="inSectionGroup"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
      <!-- I have no idea why I need this, but I apparently do. 
				//TODO: find out what is going on-->
      <xsl:apply-templates select="w:sectPr" mode="paragraph"/>
    </body>
  </text>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template w:bookmarkStart|w:bookmarkEndinSectionGroup
Documentation

Description

Ignore bookmarks

There are certain elements, that we don't really care about, but that force us to regroup everything from the next sibling on. @see grouping in construction of headline outline.

Namespace No namespace
Match w:bookmarkStart|w:bookmarkEnd
Mode inSectionGroup
Import precedence 17
Source
<xsl:template match="w:bookmarkStart|w:bookmarkEnd" mode="inSectionGroup">
  <xsl:for-each-group select="current-group() except ." group-adjacent="1">
    <xsl:apply-templates select="." mode="inSectionGroup"/>
  </xsl:for-each-group>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template w:tbl|w:pinSectionGroup
Documentation

Description

Grouping consecutive elements that belong together

We are now working on a group of all elements inside some group bounded by headings. These need to be further split up into smaller groups for figures, list etc. and into individual groups for simple paragraphs...

Namespace No namespace
Match w:tbl|w:p
Mode inSectionGroup
References
Templates listSection; tocSection
Import precedence 17
Source
<xsl:template match="w:tbl|w:p" mode="inSectionGroup">
  <!-- 
			We are looking for:
				- Lists -> 1
				- Table of Contents -> 2
			Anything else is assigned a number of position()+100. This should be
			sufficient even if we find lots more things to group.
		-->
  <xsl:for-each-group select="current-group()" group-adjacent="if (contains(w:pPr/w:pStyle/@w:val,'List')) then 1 else                if (starts-with(w:pPr/w:pStyle/@w:val,'toc')) then 2 else                position() + 100">
    <!-- For each defined grouping call a specific template. If there is no
				grouping defined, apply templates with mode paragraph -->
    <xsl:choose>
      <xsl:when test="current-grouping-key()=1">
        <xsl:call-template name="listSection"/>
      </xsl:when>
      <xsl:when test="current-grouping-key()=2">
        <xsl:call-template name="tocSection"/>
      </xsl:when>
      <!-- it is not a defined grouping .. apply templates -->
      <xsl:otherwise>
        <xsl:apply-templates select="." mode="paragraph"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each-group>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template group-by-section
Documentation

Description

Groups the document by headings and thereby creating the document structure.
Namespace No namespace
Used by
References
Import precedence 17
Source
<xsl:template name="group-by-section">
  <xsl:variable name="Style" select="w:pPr/w:pStyle/@w:val"/>
  <xsl:variable name="NextHeader" select="teidocx:get-nextlevel-header($Style)"/>
  <div>
    <!-- generate the head -->
    <xsl:call-template name="generate-section-heading">
      <xsl:with-param name="Style" select="$Style"/>
    </xsl:call-template>
    <!-- Process subheadings -->
    <xsl:for-each-group select="current-group() except ." group-starting-with="w:p[w:pPr/w:pStyle/@w:val=$NextHeader]">
      <xsl:choose>
        <xsl:when test="teidocx:is-heading(.)">
          <xsl:call-template name="group-by-section"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="." mode="inSectionGroup"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </div>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template extract-forme-work
Documentation

Description

Looks through the document to find forme work related sections.

Creates a <fw> element for each forme work related section. These include running headers and footers. The corresponding elements in OOXML are w:headerReference and w:footerReference. These elements only define a reference that to a header or footer definition file. The reference itself is resolved in the file word/_rels/document.xml.rels.

Namespace No namespace
Used by
Template w:body
References
Parameter word-directory
Import precedence 17
Source
<xsl:template name="extract-forme-work">
  <xsl:if test="preserveWordHeadersFooters='true'">
    <xsl:for-each-group select="//w:headerReference|//w:footerReference" group-by="@r:id">
      <fw>
        <xsl:attribute name="xml:id">
          <xsl:value-of select="@r:id"/>
        </xsl:attribute>
        <xsl:attribute name="type">
          <xsl:choose>
            <xsl:when test="self::w:headerReference">header</xsl:when>
            <xsl:otherwise>footer</xsl:otherwise>
          </xsl:choose>
        </xsl:attribute>
        <xsl:variable name="rid" select="@r:id"/>
        <xsl:variable name="h-file">
          <xsl:value-of select="document(concat($word-directory,'/word/_rels/document.xml.rels'))//rel:Relationship[@Id=$rid]/@Target"/>
        </xsl:variable>
        <!-- for the moment, just copy content -->
        <xsl:if test="doc-available(concat($word-directory,'/word/', $h-file))">
          <xsl:for-each-group select="document(concat($word-directory,'/word/', $h-file))/*[1]/w:*" group-adjacent="1">
            <xsl:apply-templates select="." mode="inSectionGroup"/>
          </xsl:for-each-group>
        </xsl:if>
      </fw>
    </xsl:for-each-group>
  </xsl:if>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Template w:hyperlink
Namespace No namespace
Match w:hyperlink
Mode #default
Import precedence 17
Source
Stylesheet location ../../../docx/from/from.xsl
Template w:instrText
Namespace No namespace
Match w:instrText
Mode #default
Overriding
Template w:instrText
Import precedence 17
Source
<xsl:template match="w:instrText">
  <xsl:choose>
    <xsl:when test="contains(.,'REF _Ref')"/>
    <xsl:when test="starts-with(.,'HYPERLINK')"/>
    <xsl:otherwise>
      <xsl:value-of select="."/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
Stylesheet location ../../../docx/from/from.xsl
Variable processor
Namespace No namespace
Source
<xsl:variable name="processor">
  <xsl:value-of select="system-property('xsl:vendor')"/>
</xsl:variable>
Stylesheet location ../../../docx/from/from.xsl
Variable lowercase
Namespace No namespace
Source
<xsl:variable name="lowercase">abcdefghijklmnopqrstuvwxyz</xsl:variable>
Stylesheet location ../../../docx/from/from.xsl
Variable uppercase
Namespace No namespace
Source
<xsl:variable name="uppercase">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
Stylesheet location ../../../docx/from/from.xsl
Variable digits
Namespace No namespace
Source
<xsl:variable name="digits">1234567890</xsl:variable>
Stylesheet location ../../../docx/from/from.xsl
Variable characters
Namespace No namespace
Source
<xsl:variable name="characters">~!@#$%^&*()<>{}[]|:;,.?`'"=+-_</xsl:variable>
Stylesheet location ../../../docx/from/from.xsl
Output (default)
Namespace No namespace
Output properties
method encoding indent
UTF-8 yes
Source
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
Stylesheet location ../../../docx/from/from.xsl