Construction of an XML Version of the TEI DTD

C. M. Sperberg-McQueen

7 July 1999

This unpublished document is distributed privately for comment by friends and colleagues; it is not now a formal publication and should not be quoted in published material.

This document has not yet been reviewed by both editors of the TEI; what it says about the beliefs of the editors should be taken as a proposal by the author for the approval of his co-editor.

1 Introduction
2 Tag omissibility information
3 Normalizing parameter-entity references
4 Ampersand connectors
5 Normalizing mixed-content models
6 Exceptions
7 Exclusions
8 Inclusions
9 The problem of the dictionary chapter
10 Open questions and checklists
11 Miscellaneous Housekeeping

A Notation

Abstract

This document describes issues involved in creating an XML version of the SGML document type definition (DTD) created by the Text Encoding Initiative, and proposes solutions. It defines a TEI extensions file which incorporates those solutions, in order to allow experimentation.

The discussion of inclusion exceptions defines a method of rewriting SGML content models so as to achieve effects similar to those provided by inclusion exceptions. To make an SGML document type definition compatible with XML, inclusion exceptions must be eliminated. The simplest method of ensuring that this change does not invalidate existing documents is to modify the content model of every element which can occur as a descendant of any element with inclusion exceptions in its content model, in the manner described here. That will ensure that elements named in inclusion exceptions remain legal in all the locations where they are currently legal.

The methods of changing content models described in this paper are believed to preserve determinism (what ISO 8879 calls lack of ambiguity) and to simulate the effects of inclusion exceptions properly. At this point, however, no proof of either conjecture is offered.

1 Introduction

1.1 XML and DTDs

The Extensible Markup Language (XML) defines a syntax for document type definitions similar to that provided by the Standard Generalized Markup Language (SGML), but more restrictive. In particular, XML allows neither inclusion nor exclusion exceptions, and prohibits the ampersand connector.

Modifying an existing SGML document type definition (DTD), such as the TEI DTD, to conform to XML thus involves:

removing tag omissibility information
normalizing references to parameter entities by ensuring that they always end with a semicolon
removing & connectors
normalizing mixed-content models to the canonical form prescribed by XML (#PCDATA must come first, the list of sub-elements must be flat, and the occurrence indicator must be a star)
removing exclusion exceptions
removing inclusion exceptions

1.2 Modifying the TEI DTD for XML

This document describes in detail the changes necessary to perform these modifications on the TEI DTD. The changes take the form of TEI modifications files suitable for use as the entities TEI.extensions.ent and TEI.extensions.dtd files.

The modifications have different degrees of difficulty. Some affect the technical content of the TEI DTD in serious ways, and therefore require review by the TEI's Technical Review Committee before being formally integrated into TEI P3, while others do not affect the technical content of the TEI at all, or affect it only in minor ways. Changes of this latter type may be regarded as corrections of obvious simple errors, and may be performed by the editors under their authority to correct corrigible errors in the text of the Guidelines. (The concept of corrigible error is defined in document TEI ED W46 (?); in brief, a corrigible error is one which both editors agree is an error, which has an obvious fix, and the fix for which will not affect any existing data.) Each change proposed in this paper is identified as either a correction to a corrigible error, which the editors expect to fix in the course of preparing a revised and corrected reprint of TEI P3, or else a substantive change requiring review by the Technical Review Committee.

1.3 Overview of changes to the TEI DTD

Not all of the changes to the DTD are handled by this document. [1] Those that are, are summarized in the following overviews of the extensions files.

1.4 Intended use of this document

The immediate goal of this document is to allow experimentation with the TEI DTD and XML processors, by providing the extensions files needed to make the full TEI P3 DTD work with XML processors. To use the extensions files created by this document with other extensions files (e.g. those of TEI Lite), manual merger of the extensions files is required. The editors plan to automate this merger as soon as possible; the following stages of development are anticipated:

produce extensions files from this document
modify these extensions files to allow suppression or modification of individual elements, using the naming convention xml. + GI (e.g. xml.num, xml.recordingStmt, etc.)
modify carthage and the Pizza Chef web site to automate the merger of the extensions files. The following calculations will be needed:
- if the user's TEI.extensions.ent file suppresses an element type e, generate an entity declaration of the form <!ENTITY % xml.e 'IGNORE'> so as to suppress the XML version of that element. (Strictly speaking, this is unnecessary for elements not declared here, but working out whether such a declaration is needed looks like more work than we want to put into a short-term system.)

A list of open questions is included at the end of the document.

2 Tag omissibility information

Removing tag omissibility information is a trivial task which can be accomplished by a DTD pretty printer, or even a simple editor script. The strings - -, - O, O -, and O O are legal in a DTD only as tag omissibility information, within comments, or within literals. In the TEI DTDs, they do not occur within literals or comments, so a global change in an editor would handle the problem.

To enable the necessary changes to be made with a minimum of manual intervention, however, it is probably better to add a run-time option to a DTD pretty printer, to make it suppress this information, or replace it with a reference to one of the parameter entities om.RR, om.RO, om.OR, or om.OO. If the run-time flag is set, the following entities will be added to the beginning of the DTD:

<!ENTITY % om.RR '- -'>
<!ENTITY % om.RO '- O'>
<!ENTITY % om.OR 'O -'>
<!ENTITY % om.OO 'O O'>

The program carthago has accordingly been outfitted with two run-time options to suppress the omissibility markers, or to replace them with entity references.

3 Normalizing parameter-entity references

In the short term, we will normalize parameter-entity references using the pretty printer mentioned above (or else eliminate them entirely, by running the test DTD through a pre-processor like Carthage, which expands all parameter-entity references).

In the long run, we will systematically normalize all content models in the tagdocs of TEI P3 by adding semicolons to parameter-entity references which currently do not have them. N.B. the editors regard this as a correction of a corrigible error, and this normalization will be performed in the text of TEI P3 as soon as possible.

4 Ampersand connectors

Removing ampersand connectors involves either rewriting the content model as a set of alternative sequence groups (thus retaining strict equivalence with the existing model) or revising the content model entirely. In the case of the TEI, the editors both agree that most uses of & have proven to be design errors, so we propose simply to revise the content models.

The following content models use ampersand connectors in TEI P3:

<cit> (part of the core)
<respStmt> (part of the core)
<publicationStmt> (part of the header)
<graph> (part of the additional tag set for networks and graphs)

In this section, we provide alternate declarations for each of them. In the entity extensions file we must first suppress all of them:

< 3 Suppress definitions of elements with ampersand > =

<!ENTITY % cit 'IGNORE' > <!ENTITY % respStmt 'IGNORE' > <!ENTITY % publicationStmt 'IGNORE' > <!ENTITY % graph 'IGNORE' >

And in thd DTD extensions file we must redefine them all:

< 4 New definitions of elements with ampersand > =

< New cit declaration 5 >< Define new respStmt 8 >< New publicationStmt 9 >< New graph element 10 >

N.B. All the ampersand-eliminating content-model changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

4.1 The <cit> element

The standard declaration for <cit> is as follows:

<!ELEMENT %n.cit;       - -  ((%n.q; | %n.quote;) & (%m.bibl; |
                             %m.loc;))                          >

We will redefine it with a slightly more general content model (well, almost -- see below):

< 5 New cit declaration > =

<!ENTITY % XML.cit "INCLUDE" > <![%XML.cit;[ <!ELEMENT %n.cit; - - ((%n.q; | %n.quote; | %m.bibl; | %m.loc; | %m.Incl;)+) > <!ATTLIST %n.cit; %a.global; TEIform CDATA 'cit' > ]]>

(The Incl class included here has to do with inclusion exceptions; see below.) If we wished to replicate precisely the original content model, without the ampersand, we could define <cit> thus:

<!ELEMENT %n.cit;       - -  (((%n.q; | %n.quote;),
                               (%m.bibl; | %m.loc;))
                             | ((%m.bibl; | %m.loc;),
                               (%n.q; | %n.quote;)))            >

As it turns out, however the declaration proposed above is ambiguous, since <link> is a member of both the loc and Incl classes. We'll have to unroll one or the other of these two classes; a coin toss decides that we should unroll loc.

After further investigation (i.e. further attempts to use the DTD produced by a draft of this paper), however, it becomes clear that loc is a subclass of phrase, so that every content model which uses both the phrase class and the Incl class is going to have troubles. So instead of unrolling each case individually, we take a harsher approach, and remove <link> from the loc class.

< 7 New loc class > =

 <!ENTITY % x.loc '' > <!ENTITY % m.loc '%x.loc; %n.ptr; | %n.ref; | %n.xptr; | %n.xref;' >

This should not cause problems for any existing data, since <link> is still a member of the class Incl, which is (after all) allowed virtually everywhere.

4.2 The <respStmt> element

Similarly, we could replicate the original definition of <respStmt> if we wished, but it's probably better regarded as a design error to be fixed:

<!ELEMENT %n.respStmt;  - O  ((%n.resp; & %n.name;), (%n.resp;
                             | %n.name;)*)                      >

We give it a simpler and looser declaration instead:

< 8 Define new respStmt > =

<!ENTITY % XML.respStmt "INCLUDE" > <![%XML.respStmt;[ <!ELEMENT %n.respStmt; - O (%n.resp; | %n.name; | %m.Incl;)+ > <!ATTLIST %n.respStmt; %a.global; TEIform CDATA 'respStmt' > ]]>

The prose should make clear that in principle, a <respStmt> should have at least one <resp> and at least one <name>. Enforcing that with the content model may be more pedantic than we want to be, though.

<!ELEMENT %n.respStmt;  - O  (((%n.resp;)+,
                             (%n.name;, (%n.resp; | %n.name;)*))
                             | ((%n.name;)+,
                             (%n.resp;, (%n.resp; | %n.name;)*)))

4.3 The <publicationStmt> element

The content model for <publicationStmt> includes an editorial error I am glad to have the occasion to fix. (In normal bibliographic practice, when place and publisher are both given, the place is given first. I don't know what got into me that morning.)

<!ELEMENT %n.publicationStmt;
                        - O  ((%n.p;)+ | ( (%n.publisher; |
                             %n.distributor; | %n.authority;) &
                             ((%n.pubPlace)?, (%n.address)?,
                             (%n.idno)*, (%n.availability)?,
                             (%n.date)?)+ )+ )                  >

Rather than simply replace the current content model with an equivalent ampersand-less expression, we'll change it. For compatibility with existing data, we'll make the new expression loose rather than tight.

4.4 The <graph> element

The <graph> element uses the content model to require that graphs be encoded nodes-first or arcs-first, but not mixed hugger-mugger. We'll retain that characteristic. The old declaration is this:

<!ELEMENT %n.graph;     - -  ((%n.node;)+ & (%n.arc;)*)         >

We could require arbitrarily that all nodes come first; it's not clear whether any legacy data using <graph> actually exists. But in the interests of backward compatibility, the new content model might as well allow precisely what the old one did, even if that now seems like a design error:

< 10 New graph element > =

<![%TEI.nets;[ <!ENTITY % XML.graph "INCLUDE" > <![%XML.graph;[ <!ELEMENT %n.graph; - - (((%n.node;, (%m.Incl;)*)+, (%n.arc;, (%m.Incl;)*)*) | ((%n.arc;, (%m.Incl;)*)+, (%n.node;, (%m.Incl;)*)+)) > <!ATTLIST %n.graph; %a.global; type CDATA #IMPLIED label CDATA #IMPLIED order NUMBER #IMPLIED size NUMBER #IMPLIED TEIform CDATA 'graph' > ]]> ]]>

5 Normalizing mixed-content models

5.1 Individual elements

The following elements use the keyword #PCDATA in ways that must be changed to be legal in XML:

<sense> (dictionaries)
<re> (dictionaries)
<persName> (names and dates)
<placeName> (names and dates)
<geogName> (names and dates)
<dateStruct> (names and dates)
<timeStruct> (names and dates)
<dateline> (default text structure)

In most of these cases, the #PCDATA keyword is given last, not first, in the content model; in one or two, it's neither first nor last. For example:

<!ELEMENT %n.sense;     - -  (%n.sense; | %m.dictionaryTopLevel
                             | %m.phrase | #PCDATA)*            >

In one or two cases, the group also has a plus operator instead of a star operator.

<!ELEMENT %n.timeStruct;
                        - -  ((%m.temporalExpr; | #PCDATA)+)    >

We must redeclare each of them, which means first of all that we must suppress their standard declarations:

< 11 Suppress some mixed content elements > =

<!ENTITY % sense 'IGNORE' > <!ENTITY % re 'IGNORE' > <!ENTITY % persName 'IGNORE' > <!ENTITY % placeName 'IGNORE' > <!ENTITY % geogName 'IGNORE' > <!ENTITY % dateStruct 'IGNORE' > <!ENTITY % timeStruct 'IGNORE' > <!ENTITY % dateline 'IGNORE' >

and separately we must redefine them:

< 12 Redeclare elements with mixed content elements > =

<![%TEI.dictionaries;[< New mixed content elements for dictionaries 13 >]]> <![%TEI.names.dates;[< New mixed content elements for names and dates 15 >]]>< New mixed content elements for structure 20 >

Since the normalization is purely mechanical, there seems to be no need to reproduce the original declarations here. The new declarations are given below.

N.B. All the mixed-content normalization changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

Two elements in this group are from the dictionary tag set:

< 13 New mixed content elements for dictionaries > =

<!ENTITY % XML.sense "INCLUDE" > <![%XML.sense;[ <!ELEMENT %n.sense; - - (#PCDATA | %n.sense; | %m.dictionaryTopLevel; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.sense; %a.global; %a.dictionaries; level NUMBER #IMPLIED TEIform CDATA 'sense' > ]]>

< 14 New mixed content elements for dictionaries 13 (cont'd) > =

<!ENTITY % XML.re "INCLUDE" > <![%XML.re;[ <!ELEMENT %n.re; - O (#PCDATA | %n.sense; | %m.dictionaryTopLevel; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.re; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 're' > ]]>

Note that the standard declaration for <re> also has an exclusion exception which has been dropped silently here. N.B. Elimination of exclusion exceptions is not a corrigible error; the version of this declaration which will go into TEI P3 without review is this:

<!ELEMENT %n.re;        - O  (#PCDATA | %n.sense;
                             | %m.dictionaryTopLevel;
                             | %m.phrase;)*      -(%n.re;)      >

The other elements in this group are from the tag set for names and dates.

< 15 New mixed content elements for names and dates > =

<!ENTITY % XML.persName "INCLUDE" > <![%XML.persName;[ <!ELEMENT %n.persName; - - (#PCDATA | %m.personPart; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.persName; %a.global; %a.names; type CDATA #IMPLIED TEIform CDATA 'persName' > ]]>

< 16 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.placeName "INCLUDE" > <![%XML.placeName;[ <!ELEMENT %n.placeName; - - (#PCDATA | %m.placePart; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.placeName; %a.global; type CDATA #IMPLIED full (yes | abb | init) yes %a.names; TEIform CDATA 'placeName' > ]]>

< 17 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.geogName "INCLUDE" > <![%XML.geogName;[ <!ELEMENT %n.geogName; - - (#PCDATA | %n.geog; | %n.name; | %m.Incl;)* > <!ATTLIST %n.geogName; %a.global; %a.placePart; TEIform CDATA 'geogName' > ]]>

< 18 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.dateStruct "INCLUDE" > <![%XML.dateStruct;[ <!ELEMENT %n.dateStruct; - - (#PCDATA | %m.temporalExpr; | %m.Incl;)* > <!ATTLIST %n.dateStruct; %a.global; %a.temporalExpr; calendar CDATA #IMPLIED exact CDATA #IMPLIED TEIform CDATA 'dateStruct' > ]]>

< 19 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.timeStruct "INCLUDE" > <![%XML.timeStruct;[ <!ELEMENT %n.timeStruct; - - (#PCDATA | %m.temporalExpr; | %m.Incl;)* > <!ATTLIST %n.timeStruct; %a.global; %a.temporalExpr; zone CDATA #IMPLIED TEIform CDATA 'timeStruct' > ]]>

The <dateline> element (from the default text-structure tag set) is the last one needing a mixed-content fix:

< 20 New mixed content elements for structure > =

<!ENTITY % XML.dateline "INCLUDE" > <![%XML.dateline;[ <!ELEMENT %n.dateline; - O (#PCDATA | %n.date; | %n.time; | %n.name; | %n.address; | %m.Incl;)* > <!ATTLIST %n.dateline; %a.global; TEIform CDATA 'dateline' > ]]>

5.2 The entities phrase and phrase.seq

The XML rules for mixed-content models also require that the declarations for phrase and phrase.seq be changed slightly. The current defintions are:

<!ENTITY % phrase '(#PCDATA | %m.phrase)'                       >
<!ENTITY % phrase.seq '(%phrase;)*'                             >

These give us one level too many of parentheses; we need to remove the parentheses from the entity phrase:

< 21 New declaration for phrase and phrase.seq > =

<!ENTITY % phrase '#PCDATA | %m.phrase;' > <!ENTITY % phrase.seq '(%phrase;)*' >

N.B. This change to the declaration of phrase is regarded by the editors as the correction of a corrigible error, and will be integrated into the text of TEI P3 as soon as possible.

Unfortunately, integrating this particular fix into the XML modifications file for testing will require that we either hard-code the effective value of m.phrase, or that we recreate the entire sequence of class declarations for phrase in the modifications file. (Sigh.) While we are here, we will introduce some fixes to the declarations of some classes:

add <geogName>, <persName>, <placeName> to data class
remove <anchor> from seg class
add new class editIncl
(this one's not done yet) add editIncl class and <anchor> element to globIncl class
remove members of editIncl from edit, in order to avoid non-determinism in the content models
remove <anchor> from seg class, in order to avoid non-determinism (it's already in Incl)
remove <link> from loc class, in order to avoid non-determinism (it's already in Incl)

The element <dictAnomaly> is new; for a description, see below, section The problem of the dictionary chapter.

We need to declare the name of <dictAnomaly>.

< 23 Declare new GIs > =

<!ENTITY % n.dictAnomaly 'dictAnomaly' >

5.3 Elements using phrase.seq and paraContent

Note that neither phrase.seq nor paraContent may be combined with other elements in a content model, in XML, because of the XML requirement that mixed content models not have nested groups. This affects the declarations for

<castItem> (in drama)
<docImprint> (in front matter)
<catDesc> (in the header)
<byline> (in default text structure)
<opener> (in default text structure)
<closer> (in default text structure)
<form> (in dictionaries)
<gramGrp> (in dictionaries)
<trans> (in dictionaries)
<etym> (in dictionaries)
<xr> (in dictionaries)

These must be suppressed, in order to be redeclared:

< 24 Suppress users of phrase.seq > =

<!ENTITY % castItem 'IGNORE' > <!ENTITY % docImprint 'IGNORE' > <!ENTITY % catDesc 'IGNORE' > <!ENTITY % byline 'IGNORE' > <!ENTITY % opener 'IGNORE' > <!ENTITY % closer 'IGNORE' > <!ENTITY % form 'IGNORE' > <!ENTITY % gramGrp 'IGNORE' > <!ENTITY % trans 'IGNORE' > <!ENTITY % etym 'IGNORE' > <!ENTITY % xr 'IGNORE' >

And they need to be redefined, tag set by tag set. (We put elements from each tag set into separate scraps to simplify production of specialized modification files.)

< 25 New declarations for users of phrase.seq > =

< New castItem 26 >< New docImprint 27 >< New catDesc 28 >< New opener and closer 29 >< New phrase.seq elements for dictionaries 32 >

First, the base tag set for drama:

Next the tag set for front matter:

< 27 New docImprint > =

<!ENTITY % XML.docImprint "INCLUDE" > <![%XML.docImprint;[ <!ELEMENT %n.docImprint; - O (#PCDATA | %m.phrase; | %n.pubPlace; | %n.docDate; | %n.publisher; | %m.Incl;)* > <!ATTLIST %n.docImprint; %a.global; TEIform CDATA 'docImprint' > ]]>

Then, the header:

< 28 New catDesc > =

<!ENTITY % XML.catDesc "INCLUDE" > <![%XML.catDesc;[ <!ELEMENT %n.catDesc; - O (#PCDATA | %m.phrase; | %n.textDesc;)* > <!ATTLIST %n.catDesc; %a.global; TEIform CDATA 'catDesc' > ]]>

And the default text-structure tag set:

< 29 New opener and closer > =

<!ENTITY % XML.byline "INCLUDE" > <![%XML.byline;[ <!ELEMENT %n.byline; - O (#PCDATA | %m.phrase; | %n.docAuthor; | %m.Incl;)* > <!ATTLIST %n.byline; %a.global; TEIform CDATA 'byline' > ]]>

< 31 New opener and closer 29 (cont'd) > =

<!ENTITY % XML.closer "INCLUDE" > <![%XML.closer;[ <!ELEMENT %n.closer; - O (#PCDATA | %m.phrase; | %n.signed; | %n.dateline; | %n.salute; | %m.Incl;)* > <!ATTLIST %n.closer; %a.global; TEIform CDATA 'closer' > ]]>

And finally the base tag set for dictionaries; unlike the preceding elements, these all use paraContent, not phrase.seq. N.B. these content models will require further changes before publication. See below, The problem of the dictionary chapter.

< 32 New phrase.seq elements for dictionaries > =

<![%TEI.dictionaries;[ <!ENTITY % XML.form "INCLUDE" > <![%XML.form;[ <!ELEMENT %n.form; - - (#PCDATA | %m.phrase; | %m.inter; | %m.formInfo; | %m.Incl;)* > <!ATTLIST %n.form; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 'form' > ]]>

< 33 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.gramGrp "INCLUDE" > <![%XML.gramGrp;[ <!ELEMENT %n.gramGrp; - - (#PCDATA | %m.phrase; | %m.inter; | %m.gramInfo; | %m.Incl;)* > <!ATTLIST %n.gramGrp; %a.global; %a.dictionaries; TEIform CDATA 'gramGrp' > ]]>

< 34 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.trans "INCLUDE" > <![%XML.trans;[ <!ELEMENT %n.trans; - O (#PCDATA | %m.phrase; | %m.inter; | %m.dictionaryParts; | %m.Incl;)* > <!ATTLIST %n.trans; %a.global; %a.dictionaries; TEIform CDATA 'trans' > ]]>

< 36 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.xr "INCLUDE" > <![%XML.xr;[ <!ELEMENT %n.xr; - O (#PCDATA | %m.phrase; | %m.inter; | %n.usg; | %n.lbl; | %m.Incl;)* > <!ATTLIST %n.xr; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 'xr' > ]]> ]]>

Since paraContent also occurs in the definition of specialPara, in a form not legal in XML, the specialPara entity must also be redefined; see below, The problem of specialPara elements.

6 Exceptions

Removing inclusion and exclusion exceptions typically involves changing the set of documents accepted by the DTD.[2] In the discussion which follows, I assume that our goal is to ensure that every document legal in the original DTD remains legal in the modified DTD. The changes will cause the modified DTD to accept some other documents which are not valid instances of the original DTD. That is, if the original DTD is taken as an absolutely correct definition of a language, the revised DTD will overgenerate.[3] We will wish to keep the overgeneration to a minimum, but in general we cannot eliminate it entirely, since inclusion and exclusion exceptions do extend the expressive power of the DTD notation.[4]

7 Exclusions

Rewriting declarations without exclusion exceptions involves simply removing the exception, and adding an application-specific constraint to be checked outside the SGML parser, that says the excluded element types must not occur within the element type which excluded them. Thus, for example, the TEI <s> element (for end-to-end segmentation on the level of the orthographic sentence) is currently declared thus:

<!ELEMENT s  - -  (%phrase.seq)  -(s) >

An XML-compatible TEI DTD would replace this with:

<!ELEMENT s %phrase.seq;  >

<!--* CONSTRAINT:  <s> must not occur within
    * an <s>, i.e. Ancestor(1,s) = NIL
    *-->

The important change here, for present purposes, is the removal of the exclusion exception. In addition, we have removed the tag omissibility indicators and the parentheses around phrase.seq, for reasons that should be clear from other portions of this document.

It would be possible to simulate the effect of exclusion exceptions by modifying the content models of possible descendants of <s>, so as to remove <s> from their content model; for elements which can occur both as parents and as descendants of <s>, however, this change would render some existing documents illegal; it is thus not pursued further here.

The following elements have exclusion exceptions in TEI P3:

<s> (excludes <s>)
<speaker> (excludes <speaker>)
<stage> (excludes <stage>)
<hom> (excludes <entry>)
<re> (excludes <re>)

The new declarations are precisely the same as the old declarations, only without the exclusions:

< 37 New declarations for exclusion exceptions > =

<![ %TEI.analysis; [ <!ENTITY % XML.s "INCLUDE" > <![%XML.s;[ <!ELEMENT %n.s; - - %phrase.seq; > <!ATTLIST %n.s; %a.global; %a.seg; TEIform CDATA 's' > ]]> ]]>

< 38 New declarations for exclusion exceptions 37 (cont'd) > =

<!ENTITY % XML.speaker "INCLUDE" > <![%XML.speaker;[ <!ELEMENT %n.speaker; - O %phrase.seq; > <!ATTLIST %n.speaker; %a.global; TEIform CDATA 'speaker' > ]]>

< 39 New declarations for exclusion exceptions 37 (cont'd) > =

<!ENTITY % XML.stage "INCLUDE" > <![%XML.stage;[ <!ELEMENT %n.stage; - - %specialPara; > <!ATTLIST %n.stage; %a.global; type CDATA mix TEIform CDATA 'stage' > ]]>

And they have to be excluded from the base DTD:

< 40 Suppress element declarations with exclusions > =

<!ENTITY % s 'IGNORE' > <!ENTITY % speaker 'IGNORE' > <!ENTITY % stage 'IGNORE' >

A new definition of <re> has already been given above, in the context of normalizing mixed-content models. The new definition of <hom> would be as follows:

<!ELEMENT %n.hom;       - O  (%n.sense; |
                             %m.dictionaryTopLevel)*            >

The actualy form to be used for <hom> in an XML DTD, however, varies from this, as described below in The problem of the dictionary chapter.

8 Inclusions

Removing inclusion exceptions requires simulating their effect in the content model of each element type which can occur as a descendant of the element type bearing the inclusions. This section discusses

the effect of inclusions on the language accepted by a content model
gaining that effect by modifying a finite-state automaton
gaining that effect by modifying a content-model group
examples

A brief note on the notation used is given in an appendix.

8.1 The Effect of Inclusions

Inclusions make included elements legal at any location in a content model, without however changing the requirements of the basic content model, which must still be fulfilled. (For now, I make the simplifying assumption that the set of included elements and the set of elements named in the content model are disjoint. When they are not, special considerations will apply, because of SGML's requirement that content models be deterministic.)

We can summarize the effect of inclusions very simply if we think of an FSA recognizing a content model: included elements do not change the state of the FSA. So to change an FSA without inclusions to an FSA that accepts the same language, except that it also allows the inclusion of any element i in the set of inclusions I,

    for each state s in the FSA {
       for each element i in I {
          add a transition from s to s, on i
       }
    }

8.2 The Function imf()

We can characterize the language recognized using inclusion exceptions this way. Let us construct a function imf(E,I) which maps from a regular expression E and a set of inclusions I to a new regular expression E'. Ideally we want the following to be true:

E' is deterministic if E is deterministic.
L(E) is a subset of L(E')

In general, for sequences of terminals x, y in Sigma*:

If x is in L(E) then x is in L(E').
If xy is in L(E) and i is in I then xiy is in L(E').

My best cut so far at defining such a function relies in some places on a couple of auxiliary functions. So let us define functions imf(E), mf(E), and m(E) (where i is for `initial', m for `medial', f for `final').[5] imf(E) makes the claim about xiy true for all x, y in Sigma*. mf(E) makes it true for x in Sigma+ and y in Sigma*. m(E) makes it true for x, y in Sigma+. Equivalently, we can say that any element i in I can appear initially, medially, or finally in imf(E), medially or finally (but not initially) in mf(E), and medially (but not initially or finally) in m(E).

The care we have to take with initial and final positions results from the SGML rules about determinism, but also helps keep the resulting expressions simpler than they'd be if we just slapped (I*) in everywhere in the content model.

Here is a first cut at defining the functions. In a number of circumstances, they are undefined; it might perhaps be useful, therefore, to define a simple normalization on (ampersand-free) content models, which would ensure that the functions are always defined.

If E is the empty set, then the content model in question cannot be satisfied; this would be the case if a DTD which lacked any element called <nonesuch> nevertheless included an element which required it as a subelement:

<!ELEMENT impossible - - (nonesuch) >

Given that we want L(E) is a subset of L(E') we must define imf etc. thus for this case:

imf(E) = the empty set
mf(E) = the empty set
m(E) = the empty set

An element may accept the empty string as its content in either of two ways. First, the element may be declared EMPTY: in this case, inclusions are not legal inside the element.

imf(E) = the empty string
mf(E) = the empty string
m(E) = the empty string

Second, the element's content model may accept the empty string, either because all subelements are optional or because the content model may be satisfied by #PCDATA: in this case, inclusions are legal within the element.

imf(E) = I*
mf(E) is undefined
m(E) is undefined

If E is an atomic symbol, e.g. a, then

m(E) = E
mf(E) = (m(E), I*)
imf(E) = (I*, mf(E))

If E has the form F?, and F is not nullable (does not accept the empty string), then

m(E) = m(F)?
mf(E) = (m(F), I*)?
imf(E) = I*, mf(E) = I*, (m(F), I*)?

Note that we require F to be non-nullable in order to preserve determinism.

If E has the form F?, and F is nullable, then

m(E) = m(F)
mf(E) = mf(F)
imf(E) = imf(E)

In other words, if F is nullable, the ? is redundant and may be stripped without loss of information.

If E has the form F+, and F is not nullable, then

m(E) = (m(F), (I*, m(F))*)
mf(E) = (m(F), I*)+
imf(E) = I*, mf(E) = I*, (m(F), I*)+

If E has the form F+, and F is nullable, then

m(E) = m(F*)
mf(E) = mf(F*)
imf(E) = imf(F*)

If E has the form F*, and F is not nullable, then

m(E) = (m(F), (I*, m(F))*)?
mf(E) = (m(F), I*)*
mf(E) = (mf(F))*
imf(E) = (m(F) | I)*

If E has the form F*, and F is nullable, then

m(E) is undefined
mf(E) is undefined
imf(E) = (m(F) | I)*

If E has the form (F,G), then

m(E) = mf(F), m(G), if and only if G is not nullable, else undefined
mf(E) = mf(F), mf(G)
imf(E) = imf(F), mf(G)
imf(E) = I*, mf(E) = I*, mf(F), mf(G)

If E has the form (F|G), then

m(E) = (m(F)|m(G))
mf(E) = (m(F)|m(G)), I*
mf(E) = (mf(F)|mf(G))
imf(E) = I*, mf(E) = I*, (m(F)|m(G)), I* or (I*, (mf(F)|mf(G)))

If E has the form (F&G), then

m(E) = m(F,G)|m(G,F)
mf(E) = m(F&G), I*
imf(E) = I*, m(F&G), I*

8.3 Examples

Let's do some simple examples, abstracted from the TEI.

8.3.1 Simple Examples

(a,b) ==> (I*, a, I*, b, I*) (<TEI.2> has this structure.)
(a,b+) ==> (I*, a, I*, (b, I*)+) (<teiCorpus.2> has this structure.)
(a*) ==> (a | I)* (<spanGrp> and many other elements have this structure.)
(#PCDATA | a | b | c | d)* (%paraContent et al.)
==>(m(#PCDATA | a | b | c | d) | I)*
==>((m(#PCDATA) | m(a) | m(b) | m(c) | m(d)) | I)*
==>((#PCDATA | a | b | c | d) | I)*
==>(#PCDATA | a | b | c | d | I)*
a+ ==> (I*, (a, I*)+)
(a|b)+ ==> (I*, ((a|b), I*)+)

8.3.2 A Complex Example: back

The element <back> is defined thus:

<!ELEMENT %n.back;      - O
  ( (%m.front)*,
    ( ( (%m.divtop),
        (%m.divtop | %n.titlePage;)*
      )
    | ( (%n.div;),
        (%n.div; | (%m.front))*
      )
    | ( (%n.div1;),
        (%n.div1; | (%m.front))*
      )
    )?
  )     >

Removing the parameter entities and using single-letter identifiers, we can rewrite the content model this way to show its structure a little more clearly:

( (a | b | c)*,
  ( ( (d | e | f),
      (d | e | f | g)*
    )
  | ( (h),
      (h | (a | b | c))*
    )
  | ( (i),
      (i | (a | b | c))*
    )
  )?
)

Or more compactly:

( (a | b | c)*,
  ( ( (d | e | f), (d | e | f | g)* )
  | ( h, (h | a | b | c)* )
  | ( i, (i | a | b | c)* )
  )?
)

i.e. E has the form F,G where F=(a|b|c)* and G=(((d|e|f) ... (i|a|b|c)*))?. So imf(E) = imf(F), mf(G).

Now, F is simple: imf(a|b|c)* = (a | b | c | I)*

But mf(G) requires more work.

G = H? where H =

     ( ( (d | e | f), (d | e | f | g)* )
     | ( h, (h | a | b | c)* )
     | ( i, (i | a | b | c)* )
     )

So mf(G) = (m(H), I*)?

H in turn is an alternation of three sequences, each of the form (x, (y|z)*). This leads to a problem, because the final term in each sequence is nullable; we will have a determinism conflict with the trailing I*.

So we add a new definition of mf(E) where E = F?. mf(F?) = mf(F)?

Applied to G, we have: mf(G) = (mf(H))?, with H = (J | K | L).

So mf(H) = ((m(J) | m(K) | m(L)), I*)

But J, K, and L don't have m() forms, since their final term is nullable. So we use the alternate definition:

mf(H) = (mf(J) | mf(K) | mf(L))

We have the following:

J = ( (d | e | f), (d | e | f | g)* )
mf(J) = ( (d | e | f), I*, (d | e | f | g | I)*)
K = ( h, (h | a | b | c)* )
mf(K) = ( h, I*, (h | a | b | c | I)* )
L = ( i, (i | a | b | c)* )
mf(L) = ( i, I*, (i | a | b | c | I)* )

So mf(H) =

        ( ( (d | e | f), I*, (d | e | f | g | I)*)
        | ( h, I*, (h | a | b | c | I)* )
        | ( i, I*, (i | a | b | c | I)* )
        )

Recall that mf(G) = (mf(H))?.

So mf(G) =

        ( ( (d | e | f), I*, (d | e | f | g | I)*)
        | ( h, I*, (h | a | b | c | I)* )
        | ( i, I*, (i | a | b | c | I)* )
        )?

and imf(E) = imf(F), mf(G) =

         ( (a | b | c | I)*,
           ( ( (d | e | f), I*, (d | e | f | g | I)*)
           | ( h, I*, (h | a | b | c | I)* )
           | ( i, I*, (i | a | b | c | I)* )
           )?
         )

Or, in content model terms (using the usual TEI conventions for names of element classes):

<!ELEMENT %n.back;      - O
  ( (%m.front; | %m.I;)*,
    ( ( (%m.divtop;),
        (%Istar;),
        (%m.divtop; | %n.titlePage; | %m.I;)*
      )
    | ( (%n.div;),
        (%Istar;),
        (%n.div; | %m.front; | %m.I;)*
      )
    | ( (%n.div1;),
        (%Istar;),
        (%n.div1; | %m.front; | %m.I;)*
      )
    )?
  )     >

I think we've got a system we can use manually, though I don't know for sure how to make it a program, given the problems we have defining some of the functions.

8.4 Removing inclusions in TEI P3

The following elements have inclusion exceptions in TEI P3 (as of September 1994):

<entry> (includes <anchor>)
<entryFree> (includes %m.dictionaryParts; | %m.phrase; | %m.inter;)
<eg> (includes %m.dictionaryParts; | %m.formPointers;)
<orgName> (includes <orgtitle>, <orgtype>, and <orgdivn>)
<text> (includes %m.globincl;, i.e. <alt>, <altGrp>, <cb>, <certainty>, <fLib>, <fs>, <fsLib>, <fvLib>, <index>, <interp>, <interpGrp>, <join>, <joinGrp>, <lb>, <link>, <linkGrp>, <milestone>, <pb>, <respons>, <span>, <spanGrp>, and <timeline>)
<lem> (includes %m.fragmentary;, i.e. <lacunaEnd>, <lacunaStart>, <witEnd>, and <witStart>)
<rdg> (includes %m.fragmentary;)
<termEntry> (the version in the nested DTD includes %m.terminologyInclusions;, i.e. <date>, <dateStruct>, <note>, <ptr>, <ref>, <xptr>, and <xref>)

The inclusions on <entry>, <entryFree>, and <eg> will be taken care of separately, in the section on the dictionary chapter.

The inclusions on <orgName> were dropped in October 1994 (though this change has not been propagated to any public version of the DTD), and so we will ignore them.

The inclusions on <text> must be propagated to all potential descendants of <text>.

The inclusions on <lem> and <rdg> must be propagated to all potential descendants; it might be possible to do without these, but it's probably not worth the effort.

Note that in the case of terminologyInclusions, the set of inclusions is not disjoint from the set of children named directly in content models.

Study of the full TEI DTD shows that the sets of possible descendants of <text>, <lem>, <rdg>, and <termEntry> are all identical. This is not surprising given that <text> is recursive.

The 263 elements in this set fall into the following groups:

52 elements declared EMPTY: addSpan, alt, anchor, any, arc, caesura, cb, certainty, delSpan, dft, divGen, eLeaf, event, gap, handShift, index, iNode, interp, join, kinesic, lacunaEnd, lacunaStart, lb, leaf, link, milestone, minus, move, msr, nbr, node, none, null, oRef, pause, pb, plus, pRef, ptr, rate, respons, root, shift, space, span, sym, uncertain, vocal, when, witEnd, witStart, and xptr
16 elements declared with (#PCDATA): att, day, gi, hour, idno, minute, month, offset, postBox, postCode, second, str, tag, val, week, and year. (Of these, note that <att>, <gi>, <tag>, and <val> aren't actually in the main DTD, so they won't be handled here. Perhaps all these lists need to be checked once more in a calm moment.)
57 elements declared with (%phrase.seq;): abbr, actor, addrLine, author, authority, biblScope, cl, date, dateRange, del, distance, distinct, distributor, docAuthor, docDate, edition, editor, expan, extent, funder, fw, gloss, headItem, headLabel, label, measure, mentioned, name, num, occasion, orgDivn, orgName, orgTitle, orgType, orig, phr, principal, publisher, pubplace, reg, resp, restore, role, roleDesc, rs, s, salute, signed, soCalled, speaker, sponsor, street, term, time, timeRange, trailer, and wit [Supernumerary in ND: surname, forename, genName, nameLink, addName, roleName, settlement, bloc.] [Supernumerary in Corpus: channel, constitution, derivation, domain, factuality, interaction, preparedness, purpose, birth, firstLang, langKnown, residence, education, affiliation, occupation, socecstatus, locale, activity.] [Supernumerary in Header: symbol, creation, language, classCode] [6]
one element declared with (%component.seq;): epigraph
35 elements declared with (%paraContent;): admin, camera, caption, cell, country, damage, descrip, docEdition, emph, figDesc, foreign, gram, head, hi, imprimatur, l, lang, x lem, meeting, otherForm, p, x rdg, ref, region, seg, sound, supplied, tech, title, titlePart, unclear, witDetail, witness, writing, and xref. N.B. this list does not include elements from the dictionary tag set, the feature system declaration, or the tag set declaration.[7] The dictionary tag set presents problems of its own, and the others are not part of the main TEI DTD.
nine elements declared with (%specialPara;): add, corr, item, note, q, quote, sic, stage, and view
95 elements with non-standard content models requiring manual changes: address, altgrp, analytic, app, argument, availability, back, bibl, biblfull, biblstruct, body, broadcast, byline, c, castgroup, castitem, castlist, cit, closer, dateline, datestruct, div, div0, div1, div2, div3, div4, div5, div6, div7, docimprint, doctitle, editionStmt, epilogue, equipment, etree, f, falt, figure, flib, formula, front, fs, fslib, fvlib, graph, group, imprint, interpgrp, joingrp, lg, lg1, lg2, lg3, lg4, lg5, linkgrp, list, listbibl, m, monogr, notesStmt, ofig, opener, ovar, performance, prologue, publicationStmt, pvar, rdggrp, recording, recordingStmt, respStmt, row, scriptStmt, series, seriesStmt, set, sourcedesc, sp, spangrp, table, termentry, text, tig, timeline, timestruct, titlepage, titleStmt, tree, triangle, u, valt, w, and witlist

Note that this list excludes most element types from the dictionary tag set, since they need special treatment anyway. (It does not exclude all of them, though, which puzzles me.)

Empty elements need no changes.

The other groups of elements do require changes to the DTD, which are described in the following sections.

8.4.1 The m.Incl element class

In order to simplify the process of adding inclusions to the content models of the DTD, we define a new class for use in content models, namely m.Incl. This consists of:

globincl (included by <text>)
fragmentary, if the additional tag set for text-critical apparatus is selected (included by <lem> and <rdg>)

For now, we ignore the problems posed by the <termEntry> element. In the long run, they mean the terminology tag set is going to need to be rewritten. (Of course, it needs rewriting anyway, to align it with more recent ISO work.)

< 41 Element class m.Incl > =

<!ENTITY % x.Incl ''> <![%TEI.textcrit;[  <!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl; | %m.fragmentary; | %n.anchor;' > ]]>  <!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl; | %n.anchor;' >

We have to reproduce the standard declarations for the inclusion classes:

8.4.2 Changing `#PCDATA` elements

Each element which now has a content model of #PCDATA should, for compatibility, be revised to have a content model of (#PCDATA | %m.Incl;)*.

In some cases, it might be preferable to leave the content model alone: it's not clear that it's really useful to allow index entries, feature structure libraries, and joins to occur within attribute names, generic identifiers, and the components of structured times and dates. Even within generic identifiers and so on, there might be line breaks, page breaks, or other milestones, but perhaps we should define at least some of these elements as (#PCDATA | %m.refsys;)*.

For now, for purposes of the experimental XML DTD, I propose to use the first form given.

First, we suppress all of these elements:

< 43 Suppress standard definitions of PCDATA elements > =

<!ENTITY % day 'IGNORE' > <!ENTITY % hour 'IGNORE' > <!ENTITY % minute 'IGNORE' > <!ENTITY % month 'IGNORE' > <!ENTITY % offset 'IGNORE' > <!ENTITY % second 'IGNORE' > <!ENTITY % week 'IGNORE' > <!ENTITY % year 'IGNORE' > <!ENTITY % idno 'IGNORE' > <!ENTITY % postBox 'IGNORE' > <!ENTITY % postCode 'IGNORE' > <!ENTITY % str 'IGNORE' >

Then we supply the new declarations:

< 44 New definitions for PCDATA elements > =

<![%TEI.names.dates;[ <!ENTITY % XML.day "INCLUDE" > <![%XML.day;[ <!ELEMENT %n.day; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.day; %a.global; %a.temporalExpr; TEIform CDATA 'day' > ]]> <!ENTITY % XML.hour "INCLUDE" > <![%XML.hour;[ <!ELEMENT %n.hour; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.hour; %a.global; %a.temporalExpr; TEIform CDATA 'hour' > ]]> <!ENTITY % XML.minute "INCLUDE" > <![%XML.minute;[ <!ELEMENT %n.minute; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.minute; %a.global; %a.temporalExpr; TEIform CDATA 'minute' > ]]> <!ENTITY % XML.month "INCLUDE" > <![%XML.month;[ <!ELEMENT %n.month; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.month; %a.global; %a.temporalExpr; TEIform CDATA 'month' > ]]> <!ENTITY % XML.offset "INCLUDE" > <![%XML.offset;[ <!ELEMENT %n.offset; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.offset; %a.global; value CDATA #IMPLIED %a.placePart; TEIform CDATA 'offset' > ]]> <!ENTITY % XML.second "INCLUDE" > <![%XML.second;[ <!ELEMENT %n.second; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.second; %a.global; %a.temporalExpr; TEIform CDATA 'second' > ]]> <!ENTITY % XML.week "INCLUDE" > <![%XML.week;[ <!ELEMENT %n.week; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.week; %a.global; %a.temporalExpr; TEIform CDATA 'week' > ]]> <!ENTITY % XML.year "INCLUDE" > <![%XML.year;[ <!ELEMENT %n.year; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.year; %a.global; %a.temporalExpr; TEIform CDATA 'year' > ]]> ]]> <!ENTITY % XML.idno "INCLUDE" > <![%XML.idno;[ <!ELEMENT %n.idno; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.idno; %a.global; type CDATA #IMPLIED TEIform CDATA 'idno' > ]]> <!ENTITY % XML.postBox "INCLUDE" > <![%XML.postBox;[ <!ELEMENT %n.postBox; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.postBox; %a.global; TEIform CDATA 'postBox' > ]]> <!ENTITY % XML.postCode "INCLUDE" > <![%XML.postCode;[ <!ELEMENT %n.postCode; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.postCode; %a.global; TEIform CDATA 'postCode' > ]]> <![%TEI.fs;[ <!ENTITY % XML.str "INCLUDE" > <![%XML.str;[ <!ELEMENT %n.str; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.str; %a.global; rel (eq | ne | sb | ns | lt | le | gt | ge) eq TEIform CDATA 'str' > ]]> ]]>

8.4.3 Changing phrase.seq

The parameter entity phrase.seq should be redefined as follows:

< 45 New declaration for phrase and phrase.seq > =

<!ENTITY % phrase '#PCDATA | %m.phrase; | %m.Incl;' > <!ENTITY % phrase.seq '(%phrase;)*' >

(This supersedes the redefinition given earlier. Adding the inclusions to the class phrase (i.e. to the entity m.phrase) might enable some of the redefinitions already given above to stand unchanged, but for now, at least, I propose to keep the inclusions logically separate from the original element classes.) Note that the entity phrase is used only once, in the definition of <u>.

No changes to the actual content models are needed. (Ah, the joys of indirection.)

(Note, 14 May 1999.) No, wait, actually, that's not true. Many of these declarations read

<!ELEMENT %n.foo;       - O  (%phrase.seq;)                     >

which, expanded, would be

<!ELEMENT %n.foo;       - O  ((#PCDATA | %m.phrase; | %m.Incl;)*)>

which is illegal. The content models do need to be changed, to

<!ELEMENT %n.foo;       %phrase.seq;                            >

This is only required if we wish to allow the extensions file to work with the current (1994-09) production DTDs. Since those are what I currently have on this laptop, I do wish. But since we will shortly be releasing corrected versions, we want to make this part of the extensions file optional. We'll do so using a conditional inclusion on the parameter entity base9409, which by default will be defined IGNORE.

The same logic applies to paraContent and (for now) specialPara.

(Note, 30 May 1999.) No, no, wait. Doesn't carthage already normalize these correctly by omitting extra parentheses? I've already spent several hours making the scraps below, and now realize we may not need them after all. (17 June 1999.) I've removed them, since carthage actually does produce legal XML.

8.4.4 Changing component.seq

The entity component.seq must be redefined to allow inclusions between any two components. In the long run, the changes should be made directly within the various declarations which go into component.seq, but those declarations are among the most complicated of the entire TEI DTD, since there are variant versions for each of the two hundred or so possible combinations of base tag sets.

The quick and dirty approach most suitable for use in the experimental XML DTD is to include the Incl class as a subclass of common, thus:

< 46 New declaration for x.common > =

<!ENTITY % x.common '%m.Incl; |'>

If this proves to introduce ambiguity in the content model, we'll have to find a slower, cleaner way to do it.

Experiment shows that it does indeed introduce ambiguity in content models, notably those for <body> and text divisions. Rather than hack at those content models, I am going to take the longer and slower approach.

< 47 New declaration for component and component.seq > =

<!ENTITY % x.common '' > <!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | %m.hqinter; | %m.lists; | %m.notes; | %n.stage;' >< Reproduce standard component declarations 48 >     <!ENTITY % component.seq '((%component;), (%m.Incl;)*)*' >

< 48 Reproduce standard component declarations > =

<!ENTITY % mix.verse '' > <!ENTITY % mix.drama '' > <!ENTITY % mix.spoken '' > <!ENTITY % mix.dictionaries '' > <!ENTITY % mix.terminology '' > <![ %TEI.mixed; [ <!ENTITY % TEI.singleBase 'IGNORE' > <!ENTITY % component '(%m.common; %mix.verse; %mix.drama; %mix.spoken; %mix.dictionaries; %mix.terminology;)' > ]]> <![ %TEI.general; [ <!ENTITY % TEI.singleBase 'IGNORE' > <!ENTITY % component '(%m.common; %mix.verse; %mix.drama; %mix.spoken; %mix.dictionaries; %mix.terminology;)' > <![ %TEI.verse; [ <!ENTITY % gen.verse '((%m.comp.verse;), (%m.common; | %m.comp.verse; | %m.Incl;)*) |' > ]]> <![ %TEI.drama; [ <!ENTITY % gen.drama '((%m.comp.drama;), (%m.common; | %m.comp.drama; | %m.Incl;)*) |' > ]]> <![ %TEI.spoken; [ <!ENTITY % gen.spoken '((%m.comp.spoken;), (%m.common; | %m.comp.spoken; | %m.Incl;)*) |' > ]]> <![ %TEI.dictionaries; [ <!ENTITY % gen.dictionaries '((%m.comp.dictionaries;), (%m.common; | %m.comp.dictionaries; | %m.Incl;)*) |' > ]]> <![ %TEI.terminology; [ <!ENTITY % gen.terminology '((%m.comp.terminology;), (%m.common; | %m.comp.terminology; | %m.Incl;)*) |' > ]]>   <!ENTITY % gen.verse '' > <!ENTITY % gen.drama '' > <!ENTITY % gen.spoken '' > <!ENTITY % gen.dictionaries '' > <!ENTITY % gen.terminology '' > <!ENTITY % component.seq '((%m.common;), (%m.Incl;)*)*, (%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end)?' > <!ENTITY % component.plus '(%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end) | ( ((%m.common;), (%m.Incl;)*)+, (%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end)?' >  ]]> <![ %TEI.prose; [ <!ENTITY % component '(%m.common;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.verse; [ <!ENTITY % component '(%m.common; | %m.comp.verse;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.drama; [ <!ENTITY % component '(%m.common; | %m.comp.drama;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.spoken; [ <!ENTITY % component '(%m.common; | %m.comp.spoken;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.dictionaries; [ <!ENTITY % component '(%m.common; | %m.comp.dictionaries;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.terminology; [ <!ENTITY % component '(%m.common; | %m.comp.terminology;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]>  <!ENTITY % component '(%m.common;)' > <!ENTITY % TEI.singleBase 'INCLUDE' >

8.4.5 Changing paraContent

The parameter entity paraContent must be changed as follows:

< 49 New declaration for paraContent > =

<!ENTITY % paraContent '(#PCDATA | %m.phrase; | %m.inter; | %m.Incl;)*' >

No change to actual content models is needed.

(Note, 14 May 1999.) No, wait, actually, that's not true. Many of these declarations read

<!ELEMENT %n.p;         - O  (%paraContent;)                    >

which, expanded, would be

<!ELEMENT %n.p;         - O  ((#PCDATA | %m.phrase; | %m.inter;
                             | %m.Incl;)*)                      >

which is illegal. The content models do need to be changed, to

<!ELEMENT %n.p;         - O  %paraContent;                      >

For now, though, we can rely on carthage to do the job, so I've deleted the long boring scraps that used to be here.

8.4.6 The problem of specialPara elements

In TEI P3, the entity specialPara is defined thus:

<!ENTITY % specialPara '(((%m.chunk), (%component.seq)) |
(%paraContent))'                                                >

It allows an element to contain either a series of chunks or the same content as a paragraph. It is intended for elements like notes and list items: the normal case, in which the item consists of a single paragraph, can be tagged simply (<item> ... </item>) and the multi-paragraph case can be accommodated using nested paragraphs or other chunk-level elements (<item><p> ... </p><p> ... </p></item>). In practice, the multi-paragraph form has proven very disconcerting to users, since it is not intuitively obvious that no white space may appear between the paragraphs.[8] The current definition and use of specialPara are thus acknowledged by the editors to be an error. Since there is no obvious solution, however, it is not a corrigible error.

In changing specialPara to meet the requirements of XML, there are three obvious possible solutions. We can overgenerate, so as to allow all existing data to remain valid:

<!ENTITY % specialPara '(#PCDATA | %m.phrase; | %m.inter;
| %m.chunk;)*' >

This has the drawback of allowing paragraphs and other chunk-level elements to float within character data, thus violating one of the few consistently followed rules of the TEI DTD.

Alternatively, we can bite the bullet and require that list items and notes which consist of a single paragraph be marked as such:

<!ENTITY % specialPara '%component.seq;' >

This has the advantage of being relatively clean, but it has the major disadvantage of requiring retagging for almost all current list items and notes. What is now tagged <item> ... </item> would have to be retagged <item><p> ... </p></item>. The best that can be said is that such retagging could in principle be automated.

A third approach would be to have distinct element types for simple list items and notes, and compound ones. The simple form could be defined as containing paraContent, and the compound ones as containing component.seq. This would also require retagging (of all compound list items and notes), but not as much as the previous approach.

For purposes of the experimental XML DTD, we take the first approach.

The following element types are defined as containing specialPara:

<q> (in the core)
<quote> (in the core)
<sic> (in the core)
<corr> (in the core)
<add> (in the core) -- but not <del>!
<item> (in the core)
<note> (in the core)
<stage> (in the core)
<set> (in drama -- needs manual fix)
<view> (in drama)
<equiv> (in tag set documentation)

All but one of these can be fixed simply by redefining specialPara thus:

< 50 New specialPara > =

<!ENTITY % specialPara '(#PCDATA | %m.phrase; | %m.inter; | %m.chunk; | %m.Incl;)*' >

In order to redefine specialPara, we must first reproduce a number of class declarations from teiclas2.ent:

The <ab> element is new and we need to declare its content model:

< 52 Declare new GIs 23 (cont'd) > =

<!ENTITY % n.ab 'ab' >

Only one content model must be redefined by hand, to flatten the group: that of <set> in the drama tag set. The current definition is this:

<!ELEMENT set           - -  ((head)?, %specialPara;)           >
<!ATTLIST set                %a.global;
          TEIform            CDATA               'set'          >

If we flatten this in the expected way, we get this:

<!ELEMENT %n.set;       - -  (#PCDATA | %m.phrase; | %m.inter;
                             | %m.chunk; | %m.Incl;
                             | %n.head;)*                       >
<!ATTLIST %n.set;            %a.global;
          TEIform            CDATA               'set'          >

This has the unfortunate result of allowing <head> elements at random locations; it might be better, in this case, to tighten the content model instead.[9] Version 2 of the new model is this:

< 53 New definition of set element > =

<![%TEI.drama;[ <!ENTITY % XML.set "INCLUDE" > <![%XML.set;[ <!ELEMENT %n.set; - - ((%n.head;)?, %component.seq;) > <!ATTLIST %n.set; %a.global; TEIform CDATA 'set' > ]]> ]]>

Version 2 is not strictly compatible with the old version: to be fully compatible we have to allow inclusions up front (Version 3):

<!ELEMENT %n.set;       - -  ((%m.Incl;)*, (%n.head;)?,
                             %component.seq;)                   >

For now, the experimental XML version of the DTD will use Version 2 of this declaration.

8.5 Elements requiring manual intervention

(Scraps suppressing and redeclaring the remaining elements to be supplied here.)

The elements to be treated here are: address, altgrp, analytic, app, argument, availability, back, bibl, biblfull, biblstruct, body, broadcast, byline, c, castgroup, castitem, castlist, cit, closer, dateline, datestruct, div, div0, div1, div2, div3, div4, div5, div6, div7, docimprint, doctitle, editionStmt, epilogue, equipment, etree, f, falt, figure, flib, formula, front, fs, fslib, fvlib, graph, group, imprint, interpgrp, joingrp, lg, lg1, lg2, lg3, lg4, lg5, linkgrp, list, listbibl, m, monogr, notesStmt, ofig, opener, ovar, performance, prologue, publicationStmt, pvar, rdggrp, recording, recordingStmt, respStmt, row, scriptStmt, series, seriesStmt, set, sourcedesc, sp, spangrp, table, termentry, text, tig, timeline, timestruct, titlepage, titleStmt, tree, triangle, u, valt, w, and witlist.

The following sections provide the DTD fragments necessary for suppressing the existing declarations for these elements and declaring them with new content models.

8.5.1 Core tag set

< 54 Suppress definitions in core tag set > =

<!ENTITY % address 'IGNORE' > <!ENTITY % analytic 'IGNORE' > <!ENTITY % bibl 'IGNORE' > <!ENTITY % biblFull 'IGNORE' > <!ENTITY % biblStruct 'IGNORE' > <!ENTITY % cit 'IGNORE' > <!ENTITY % imprint 'IGNORE' > <!ENTITY % lg 'IGNORE' > <!ENTITY % list 'IGNORE' > <!ENTITY % listBibl 'IGNORE' > <!ENTITY % monogr 'IGNORE' > <!ENTITY % respStmt 'IGNORE' > <!ENTITY % series 'IGNORE' > <!ENTITY % sp 'IGNORE' >

The existing declarations are these:

<!ELEMENT %n.address;   - O  ((%n.addrLine)+ | (%m.addrPart)*)  >
<!ELEMENT %n.analytic;  - O  (%n.author; | %n.editor; |
                             %n.respStmt; | %n.title;)*         >
<!ELEMENT %n.bibl;      - O  (#PCDATA | %m.phrase; |
                             %m.biblPart;)*                     >
<!ELEMENT %n.biblFull;  - O  (%n.titleStmt;, 
                             (%n.editionStmt)?,
                             (%n.extent)?, 
                             %n.publicationStmt;,
                             (%n.seriesStmt)?, 
                             (%n.notesStmt)?,
                             (%n.sourceDesc)*)                  >
<!ELEMENT %n.biblStruct;
                        - O  ((%n.analytic)?, 
                              (%n.monogr;,
                              (%n.series)*)+, 
                              (%n.note; | %n.idno;)*)           >
<!ELEMENT %n.cit;       - -  ((%n.q; | %n.quote;) 
                             & (%m.bibl; | %m.loc;))            >
<!ELEMENT %n.imprint;   - O  (%n.pubPlace; | %n.publisher;
                             | %n.date; | %n.biblScope;)*       >
<!ELEMENT %n.lg;        - O  ((%m.divtop)*, (%n.l; | %n.lg;)+,
                             (%m.divbot)*)                      >
<!ELEMENT %n.list;      - -  ( (%n.head)?,
                               ( ( (%n.item)+ )
                                 | ( (%n.headLabel)?,
                                     (%n.headItem)?,
                                     (%n.label;, %n.item;)+)))  >
<!ELEMENT %n.listBibl;  - -  ((%n.head)?, (%n.bibl; |
                             %n.biblStruct; | %n.biblFull;)+,
                             (%n.trailer)?)                     >
<!ELEMENT %n.monogr;    - O  ( ( ( (%n.author; | %n.editor; |
                                    %n.respStmt;)+, 
                                   (%n.title)+,
                                   (%n.editor; | %n.respStmt;)*) 
                                 |
                                 ( (%n.title)+, 
                                   (%n.author; | %n.editor; 
                                   | %n.respStmt;)*))?,
                               (%n.note; | %n.meeting;)*,
                               (%n.edition;, 
                                 (%n.editor; | %n.respStmt;)*)*, 
                               %n.imprint;,
                               (%n.imprint; | %n.extent; 
                                 | %n.biblScope;)* )            >
<!ELEMENT %n.respStmt;  - O  ((%n.resp; & %n.name;), 
                              (%n.resp; | %n.name;)*)           >
<!ELEMENT %n.series;    - O  (%n.title; | %n.editor; |
                             %n.respStmt; | %n.biblScope;)*     >
<!ELEMENT %n.sp;        - O  ((%n.speaker)?, (%n.p; | %n.l; |
                             %n.lg; | %n.seg; | %n.stage;)+)    >

The new definitions are these; note that <cit> and <respStmt> have already been declared above.

< 55 New definitions for core tag set > =

<!ENTITY % XML.address "INCLUDE" > <![%XML.address;[ <!ELEMENT %n.address; - O ((%m.Incl;)*, ( (%n.addrLine;, (%m.Incl;)*)+ | ((%m.addrPart;), (%m.Incl;)*)*)) > <!ATTLIST %n.address; %a.global; TEIform CDATA 'address' > ]]>

< 56 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.analytic "INCLUDE" > <![%XML.analytic;[ <!ELEMENT %n.analytic; - O (%n.author; | %n.editor; | %n.respStmt; | %n.title; | %m.Incl;)* > <!ATTLIST %n.analytic; %a.global; TEIform CDATA 'analytic' > ]]>

< 57 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.bibl "INCLUDE" > <![%XML.bibl;[ <!ELEMENT %n.bibl; - O (#PCDATA | %m.phrase; | %m.biblPart; | %m.Incl;)* > <!ATTLIST %n.bibl; %a.global; %a.declarable; TEIform CDATA 'bibl' > ]]>

< 58 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.biblFull "INCLUDE" > <![%XML.biblFull;[ <!ELEMENT %n.biblFull; - O ((%m.Incl;)*, (%n.titleStmt;, (%m.Incl;)*), (%n.editionStmt;, (%m.Incl;)*)?, (%n.extent;, (%m.Incl;)*)?, (%n.publicationStmt;, (%m.Incl;)*), (%n.seriesStmt;, (%m.Incl;)*)?, (%n.notesStmt;, (%m.Incl;)*)?, (%n.sourceDesc;, (%m.Incl;)*)* ) > <!ATTLIST %n.biblFull; %a.global; %a.declarable; TEIform CDATA 'biblFull' > ]]>

< 59 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.biblStruct "INCLUDE" > <![%XML.biblStruct;[ <!ELEMENT %n.biblStruct; - O ((%m.Incl;)*, (%n.analytic;, (%m.Incl;)*)?, ( (%n.monogr;, (%m.Incl;)*), (%n.series;, (%m.Incl;)*)* )+, ( (%n.note; | %n.idno;), (%m.Incl;)*)*) > <!ATTLIST %n.biblStruct; %a.global; %a.declarable; TEIform CDATA 'biblStruct' > ]]>

< 60 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.imprint "INCLUDE" > <![%XML.imprint;[ <!ELEMENT %n.imprint; - O (%n.pubPlace; | %n.publisher; | %n.date; | %n.biblScope; | %m.Incl;)* > <!ATTLIST %n.imprint; %a.global; TEIform CDATA 'imprint' > ]]>

< 61 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.lg "INCLUDE" > <![%XML.lg;[ <!ELEMENT %n.lg; - O ((%m.divtop; | %m.Incl;)*, (%n.l; | %n.lg;), (%n.l; | %n.lg; | %m.Incl;)*, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.lg; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg' > ]]>

< 62 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.list "INCLUDE" > <![%XML.list;[ <!ELEMENT %n.list; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ( ((%n.item;, (%m.Incl;)*)*) | ( (%n.headLabel;, (%m.Incl;)*)?, (%n.headItem;, (%m.Incl;)*)?, (%n.label;, (%m.Incl;)*, %n.item;, (%m.Incl;)*)+))) > <!ATTLIST %n.list; %a.global; type CDATA simple TEIform CDATA 'list' > ]]>

< 63 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.listBibl "INCLUDE" > <![%XML.listBibl;[ <!ELEMENT %n.listBibl; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.bibl; | %n.biblStruct; | %n.biblFull;), (%n.bibl; | %n.biblStruct; | %n.biblFull; | %m.Incl;)*, (%n.trailer;, (%m.Incl;)*)?) > <!ATTLIST %n.listBibl; %a.global; %a.declarable; TEIform CDATA 'listBibl' > ]]>

< 65 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.series "INCLUDE" > <![%XML.series;[ <!ELEMENT %n.series; - O (%n.title; | %n.editor; | %n.respStmt; | %n.biblScope; | %m.Incl;)* > <!ATTLIST %n.series; %a.global; TEIform CDATA 'series' > ]]>

< 66 New definitions for core tag set 55 (cont'd) > =

<!ENTITY % XML.sp "INCLUDE" > <![%XML.sp;[ <!ELEMENT %n.sp; - O ((%m.Incl;)*, (%n.speaker;, (%m.Incl;)*)?, ((%n.p; | %n.l; | %n.lg; | %n.seg; | %n.ab; | %n.stage;), (%m.Incl;)*)+) > <!ATTLIST %n.sp; %a.global; who IDREFS #IMPLIED TEIform CDATA 'sp' > ]]>

8.5.2 Basic text-structure tag set

< 67 Suppress definitions in text-structure tag set > =

<!ENTITY % argument 'IGNORE' > <!ENTITY % back 'IGNORE' > <!ENTITY % body 'IGNORE' > <!ENTITY % byline 'IGNORE' > <!ENTITY % closer 'IGNORE' > <!ENTITY % dateline 'IGNORE' > <!ENTITY % div 'IGNORE' > <!ENTITY % div0 'IGNORE' > <!ENTITY % div1 'IGNORE' > <!ENTITY % div2 'IGNORE' > <!ENTITY % div3 'IGNORE' > <!ENTITY % div4 'IGNORE' > <!ENTITY % div5 'IGNORE' > <!ENTITY % div6 'IGNORE' > <!ENTITY % div7 'IGNORE' > <!ENTITY % group 'IGNORE' > <!ENTITY % opener 'IGNORE' > <!ENTITY % text 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.argument;  - -  ((%n.head)?, %component.seq;)      >
<!ELEMENT %n.back;      - O  ( (%m.front)*, ( ( (%m.divtop),
                             (%m.divtop | %n.titlePage;)*) | (
                             (%n.div;), (%n.div; |
                             (%m.front))*) | ( (%n.div1;),
                             (%n.div1; | (%m.front))*) )? )     >
<!ELEMENT %n.body;      - O  ((%m.divtop;)*, ( ( (%n.divGen)*,
                             ( (%n.div;, (%n.div; |
                             %n.divGen;)*) | (%n.div0;,
                             (%n.div0; | %n.divGen;)*) |
                             (%n.div1;, (%n.div1; |
                             %n.divGen;)*) ) ) | (
                             (%component)+, ((%n.divGen)*, (
                             (%n.div;, (%n.div; | %n.divGen;)*)
                             | (%n.div0;, (%n.div0; |
                             %n.divGen;)*) | (%n.div1;,
                             (%n.div1; | %n.divGen;)*) )? ))),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.byline;    - O  (%phrase.seq; | %n.docAuthor;)*    >
<!ELEMENT %n.closer;    - O  (%n.signed; | %n.dateline; |
                             %n.salute; | %phrase.seq;)*        >
<!ELEMENT %n.dateline;  - O  (%n.date; | %n.time; | %n.name; |
                             #PCDATA | %n.address;)*            >
<!ELEMENT %n.div;       - O  ((%m.divtop;)*, ((%n.div; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div0;      - O  ((%m.divtop;)*, ( (%n.div1; |
                             %n.divGen;)+ | ( (%component;)+,
                             (%n.div1; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div1;      - O  ((%m.divtop;)*, ( (%n.div2; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div2; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div2;      - O  ((%m.divtop;)*, ( (%n.div3; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div3; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div3;      - O  ((%m.divtop;)*, ( (%n.div4; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div4; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div4;      - O  ((%m.divtop;)*, ( (%n.div5; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div5; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div5;      - O  ((%m.divtop;)*, ( (%n.div6; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div6; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div6;      - O  ((%m.divtop;)*, ((%n.div7; |
                             %n.divGen;)+ | ((%component;)+,
                             (%n.div7; | %n.divGen;)*)),
                             (%m.divbot;)*)                     >
<!ELEMENT %n.div7;      - O  ((%m.divtop;)*, (%component;)+,
                             (%m.divbot;)*)                     >
<!ELEMENT %n.group;     - O  ((%m.divtop;)*, (%n.text; |
                             %n.group;)+, (%m.divbot;)*)        >
<!ELEMENT %n.opener;    - O  (%n.signed; | %n.dateline; |
                             %n.salute; | %phrase.seq;)*        >
<!ELEMENT %n.text;      - -  ((%n.front)?, (%n.body; |
                             %n.group;), (%n.back)?)
                                                +(%m.globincl;) >

The new definitions are as follows:

< 68 New definitions for text-structure tag set > =

<!ENTITY % XML.argument "INCLUDE" > <![%XML.argument;[ <!ELEMENT %n.argument; - - ((%m.Incl;)*, (%n.head;, %component.seq;)?) > <!ATTLIST %n.argument; %a.global; TEIform CDATA 'argument' > ]]>

< 71 New definitions for text-structure tag set 68 (cont'd) > =

 <!ENTITY % XML.div "INCLUDE" > <![%XML.div;[ <!ELEMENT %n.div; - O ( (%m.divtop; | %m.Incl;)*, ( ((%n.div; | %n.divGen;), (%m.Incl;)*)+ | ( (%component;, (%m.Incl;)*)+, ((%n.div; | %n.divGen;), (%m.Incl;)*)*) ), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div' > ]]>

< 72 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div0 "INCLUDE" > <![%XML.div0;[ <!ELEMENT %n.div0; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div1; | %n.divGen;), (%m.Incl;)*)+ | ( (%component;, (%m.Incl;)*)+, ((%n.div1; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div0; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div0' > ]]>

< 73 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div1 "INCLUDE" > <![%XML.div1;[ <!ELEMENT %n.div1; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div2; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div2; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div1; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div1' > ]]>

< 74 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div2 "INCLUDE" > <![%XML.div2;[ <!ELEMENT %n.div2; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div3; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div3; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div2; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div2' > ]]>

< 75 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div3 "INCLUDE" > <![%XML.div3;[ <!ELEMENT %n.div3; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div4; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div4; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div3; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div3' > ]]>

< 76 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div4 "INCLUDE" > <![%XML.div4;[ <!ELEMENT %n.div4; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div5; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div5; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div4; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div4' > ]]>

< 77 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div5 "INCLUDE" > <![%XML.div5;[ <!ELEMENT %n.div5; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div6; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div6; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div5; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div5' > ]]>

< 78 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div6 "INCLUDE" > <![%XML.div6;[ <!ELEMENT %n.div6; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div7; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div7; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div6; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div6' > ]]>

< 79 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.div7 "INCLUDE" > <![%XML.div7;[ <!ELEMENT %n.div7; - O ((%m.divtop; | %m.Incl;)*, (%component;, (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div7; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div7' > ]]>

< 80 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.group "INCLUDE" > <![%XML.group;[ <!ELEMENT %n.group; - O ((%m.divtop; | %m.Incl;)*, ((%n.text; | %n.group;), (%n.text; | %n.group; | %m.Incl;)*), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.group; %a.global; %a.declaring; TEIform CDATA 'group' > ]]>

< 81 New definitions for text-structure tag set 68 (cont'd) > =

<!ENTITY % XML.text "INCLUDE" > <![%XML.text;[ <!ELEMENT %n.text; - - ((%m.Incl;)*, (%n.front;, (%m.Incl;)*)?, (%n.body; | %n.group;), (%m.Incl;)*, (%n.back;, (%m.Incl;)*)?) > <!ATTLIST %n.text; %a.global; %a.declaring; TEIform CDATA 'text' > ]]>

8.5.3 Front-matter tag set

< 82 Suppress definitions in front-matter tag set > =

 <!ENTITY % docTitle 'IGNORE' > <!ENTITY % front 'IGNORE' > <!ENTITY % titlePage 'IGNORE' >

The existing declarations are these:

<!ELEMENT %n.front;     - O  ( (%m.front;)*, ( ( (%m.divtop;),
                             (%m.divtop; | %n.titlePage;)*) | (
                             (%n.div;), (%n.div; | (%m.front;)
                             )*) | ( (%n.div1;), (%n.div1; |
                             (%m.front;) )*) )? )               >
<!ELEMENT %n.titlePage; - O  (%m.tpParts;)+                     >
<!ELEMENT %n.docTitle;  - O  ((%n.titlePart)+)                  >

The new definitions are these. The definition for <front> has been changed to use fmchunk instead of divtop.

< 84 New definitions for front-matter tag set 83 (cont'd) > =

<!ENTITY % XML.titlePage "INCLUDE" > <![%XML.titlePage;[ <!ELEMENT %n.titlePage; - O ((%m.Incl;)*, (%m.tpParts;), (%m.tpParts; | %m.Incl;)*) > <!ATTLIST %n.titlePage; %a.global; type CDATA #IMPLIED TEIform CDATA 'titlePage' > ]]>

< 85 New definitions for front-matter tag set 83 (cont'd) > =

<!ENTITY % XML.docTitle "INCLUDE" > <![%XML.docTitle;[ <!ELEMENT %n.docTitle; - O ((%m.Incl;)*, (%n.titlePart;, (%m.Incl;)*)+) > <!ATTLIST %n.docTitle; %a.global; TEIform CDATA 'docTitle' > ]]>

8.5.4 Header tag set

< 86 Suppress definitions in header tag set > =

<!ENTITY % availability 'IGNORE' > <!ENTITY % broadcast 'IGNORE' > <!ENTITY % editionStmt 'IGNORE' > <!ENTITY % equipment 'IGNORE' > <!ENTITY % notesStmt 'IGNORE' >  <!ENTITY % recording 'IGNORE' > <!ENTITY % recordingStmt 'IGNORE' > <!ENTITY % scriptStmt 'IGNORE' > <!ENTITY % seriesStmt 'IGNORE' > <!ENTITY % sourceDesc 'IGNORE' > <!ENTITY % titleStmt 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.availability;
                        - O  ((%n.p;)+)                         >
<!ELEMENT %n.broadcast; - -  ((%n.p)+ | %n.bibl; |
                             %n.biblStruct; | %n.biblFull; |
                             %n.recording;)                     >
<!ELEMENT %n.editionStmt;
                        - O  ( (%n.edition;, (%n.respStmt)*) |
                             (%n.p;)+ )                         >
<!ELEMENT %n.equipment; - O  ((%n.p;)+)                         >

<!ELEMENT %n.notesStmt; - O  ((%n.note)+)                       >
<!ELEMENT %n.recording; - -  ((%n.p)+ | (%n.respStmt; |
                             %n.equipment; | %n.broadcast; |
                             %n.date;)*)                        >
<!ELEMENT %n.recordingStmt;
                        - -  ((%n.p)+ | (%n.recording)+ )       >
<!ELEMENT %n.scriptStmt;
                        - -  ((%n.p)+ | %n.bibl; | %n.biblFull;
                             | %n.biblStruct;)                  >
<!ELEMENT %n.seriesStmt;
                        - O  ( (%n.title;, (%n.idno; |
                             %n.respStmt;)*) | (%n.p)+ )        >
<!ELEMENT %n.sourceDesc;
                        - -  (%n.p; | %n.bibl; | %n.biblFull; |
                             %n.biblStruct; | %n.listBibl; |
                             %n.scriptStmt; |
                             %n.recordingStmt;)+                >
<!ELEMENT %n.titleStmt; - O  (((%n.title)+, (%n.author; |
                             %n.editor; | %n.sponsor; |
                             %n.funder; | %n.principal; |
                             %n.respStmt;)*))                   >

The new definitions are as follows. We've changed the language for some element types, in parallel with changes to TEI P3:

<availability> can be empty

< 87 New definitions for header tag set > =

<!ENTITY % XML.availability "INCLUDE" > <![%XML.availability;[ <!ELEMENT %n.availability; - O (%n.p; | %m.Incl;)* > <!ATTLIST %n.availability; %a.global; status (free | unknown | restricted) #IMPLIED TEIform CDATA 'availability' > ]]>

< 88 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.broadcast "INCLUDE" > <![%XML.broadcast;[ <!ELEMENT %n.broadcast; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | ((%n.bibl; | %n.biblStruct; | %n.biblFull; | %n.recording;), (%m.Incl;)*))) > <!ATTLIST %n.broadcast; %a.global; %a.declarable; TEIform CDATA 'broadcast' > ]]>

< 89 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.editionStmt "INCLUDE" > <![%XML.editionStmt;[ <!ELEMENT %n.editionStmt; - O ((%m.Incl;)*, ((%n.edition;, (%n.respStmt; | %m.Incl;)*) | (%n.p;, (%m.Incl;)*)+) ) > <!ATTLIST %n.editionStmt; %a.global; TEIform CDATA 'editionStmt' > ]]>

< 90 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.equipment "INCLUDE" > <![%XML.equipment;[ <!ELEMENT %n.equipment; - O ((%m.Incl;)*, (%n.p;, (%m.Incl;)*)+) > <!ATTLIST %n.equipment; %a.global; %a.declarable; TEIform CDATA 'equipment' > ]]>

< 91 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.notesStmt "INCLUDE" > <![%XML.notesStmt;[ <!ELEMENT %n.notesStmt; - O ((%m.Incl;)*, (%n.note;, (%m.Incl;)*)+) > <!ATTLIST %n.notesStmt; %a.global; TEIform CDATA 'notesStmt' > ]]>

< 92 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.recording "INCLUDE" > <![%XML.recording;[ <!ELEMENT %n.recording; - - (((%m.Incl;)*, (%n.p;, (%m.Incl;)*)+) | ((%n.respStmt; | %n.equipment; | %n.broadcast; | %n.date;), (%m.Incl;)*)*) > <!ATTLIST %n.recording; %a.global; %a.declarable; type (audio | video) audio dur CDATA #IMPLIED TEIform CDATA 'recording' > ]]>

< 93 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.recordingStmt "INCLUDE" > <![%XML.recordingStmt;[ <!ELEMENT %n.recordingStmt; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | (%n.recording;, (%m.Incl;)*)+ ))> <!ATTLIST %n.recordingStmt; %a.global; TEIform CDATA 'recordingStmt'> ]]>

< 94 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.scriptStmt "INCLUDE" > <![%XML.scriptStmt;[ <!ELEMENT %n.scriptStmt; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | ((%n.bibl; | %n.biblStruct; | %n.biblFull;), (%m.Incl;)*))) > <!ATTLIST %n.scriptStmt; %a.global; %a.declarable; TEIform CDATA 'scriptStmt' > ]]>

< 95 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.seriesStmt "INCLUDE" > <![%XML.seriesStmt;[ <!ELEMENT %n.seriesStmt; - O ((%m.Incl;)*, ((%n.title;, (%n.idno; | %n.respStmt; | %m.Incl;)* ) | (%n.p;, (%m.Incl;)*)+) ) > <!ATTLIST %n.seriesStmt; %a.global; TEIform CDATA 'seriesStmt' > ]]>

< 96 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.sourceDesc "INCLUDE" > <![%XML.sourceDesc;[ <!ELEMENT %n.sourceDesc; - - ((%m.Incl;)*, ((%n.p; | %n.bibl; | %n.biblFull; | %n.biblStruct; | %n.listBibl; | %n.scriptStmt; | %n.recordingStmt;), (%m.Incl;)*)+) > <!ATTLIST %n.sourceDesc; %a.global; %a.declarable; TEIform CDATA 'sourceDesc' > ]]>

< 97 New definitions for header tag set 87 (cont'd) > =

<!ENTITY % XML.titleStmt "INCLUDE" > <![%XML.titleStmt;[ <!ELEMENT %n.titleStmt; - O ( (%m.Incl;)*, (%n.title;, (%m.Incl;)*)+, ( (%n.author; | %n.editor; | %n.sponsor; | %n.funder; | %n.principal; | %n.respStmt;), (%m.Incl;)*)* ) > <!ATTLIST %n.titleStmt; %a.global; TEIform CDATA 'titleStmt' > ]]>

8.5.5 Verse tag set

< 98 Suppress definitions in verse tag set > =

<!ENTITY % lg1 'IGNORE' > <!ENTITY % lg2 'IGNORE' > <!ENTITY % lg3 'IGNORE' > <!ENTITY % lg4 'IGNORE' > <!ENTITY % lg5 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.lg1;       - O  ((%n.head)?, (%n.l; | %n.lg2;)+)   >
<!ELEMENT %n.lg2;       - O  ((%n.head)?, (%n.l; | %n.lg3;)+)   >
<!ELEMENT %n.lg3;       - O  ((%n.head)?, (%n.l; | %n.lg4;)+)   >
<!ELEMENT %n.lg4;       - O  ((%n.head)?, (%n.l; | %n.lg5;)+)   >
<!ELEMENT %n.lg5;       - O  ((%n.head)?, (%n.l)+)              >

The new definitions are as follows:

< 99 New definitions for verse tag set > =

<![%TEI.verse;[ <!ENTITY % XML.lg1 "INCLUDE" > <![%XML.lg1;[ <!ELEMENT %n.lg1; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg2;), (%m.Incl;)*)+) > <!ATTLIST %n.lg1; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg1' > ]]>

< 100 New definitions for verse tag set 99 (cont'd) > =

<!ENTITY % XML.lg2 "INCLUDE" > <![%XML.lg2;[ <!ELEMENT %n.lg2; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg3;), (%m.Incl;)*)+) > <!ATTLIST %n.lg2; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg2' > ]]>

< 101 New definitions for verse tag set 99 (cont'd) > =

<!ENTITY % XML.lg3 "INCLUDE" > <![%XML.lg3;[ <!ELEMENT %n.lg3; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg4;), (%m.Incl;)*)+) > <!ATTLIST %n.lg3; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg3' > ]]>

< 102 New definitions for verse tag set 99 (cont'd) > =

<!ENTITY % XML.lg4 "INCLUDE" > <![%XML.lg4;[ <!ELEMENT %n.lg4; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg5;), (%m.Incl;)*)+) > <!ATTLIST %n.lg4; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg4' > ]]>

< 103 New definitions for verse tag set 99 (cont'd) > =

<!ENTITY % XML.lg5 "INCLUDE" > <![%XML.lg5;[ <!ELEMENT %n.lg5; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.l;, (%m.Incl;)*)+) > <!ATTLIST %n.lg5; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg5' > ]]> ]]>

8.5.6 Drama tag set

< 104 Suppress definitions in drama tag set > =

<!ENTITY % castGroup 'IGNORE' >  <!ENTITY % castList 'IGNORE' > <!ENTITY % epilogue 'IGNORE' > <!ENTITY % performance 'IGNORE' > <!ENTITY % prologue 'IGNORE' > <!ENTITY % set 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.castGroup; - -  (
                                (%n.head;)?, 
                                (%n.castItem; | %n.castGroup;)+, 
                                (%n.trailer;)?)    >
<!ELEMENT %n.castItem;  - O  (%n.role; | %n.roleDesc; |
                             %n.actor; | (%phrase.seq))*        >
<!ELEMENT %n.castList;  - -  (  (%m.divtop;)*, 
                                (%component;)*,
                                (%n.castItem; | %n.castGroup;)+,
                                (%component;)*)                 >
<!ELEMENT %n.epilogue;  - -  ((%m.divtop)*, (%component)+,
                             (%m.divbot)*)                      >
<!ELEMENT %n.performance;
                        - -  ((%m.divtop)*, (%component)+,
                             (%m.divbot)*)                      >
<!ELEMENT %n.prologue;  - -  ((%m.divtop)*, (%component)+,
                             (%m.divbot)*)                      >
<!ELEMENT %n.set;       - -  ((%n.head)?, %specialPara)         >

The new definitions are as follows:

< 105 New definitions for drama tag set > =

<![%TEI.drama;[ <!ENTITY % XML.castGroup "INCLUDE" > <![%XML.castGroup;[ <!ELEMENT %n.castGroup; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.castItem; | %n.castGroup;), (%m.Incl;)*)+, (%n.trailer;, (%m.Incl;)*)?) > <!ATTLIST %n.castGroup; %a.global; TEIform CDATA 'castGroup' > ]]>

< 106 New definitions for drama tag set 105 (cont'd) > =

<!ENTITY % XML.castList "INCLUDE" > <![%XML.castList;[ <!ELEMENT %n.castList; - - ( (%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)*, ((%n.castItem; | %n.castGroup;), (%m.Incl;)*)+, ((%component;), (%m.Incl;)*)*) > <!ATTLIST %n.castList; %a.global; TEIform CDATA 'castList' > ]]>

< 107 New definitions for drama tag set 105 (cont'd) > =

<!ENTITY % XML.epilogue "INCLUDE" > <![%XML.epilogue;[ <!ELEMENT %n.epilogue; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.epilogue; %a.global; TEIform CDATA 'epilogue' > ]]>

< 108 New definitions for drama tag set 105 (cont'd) > =

<!ENTITY % XML.performance "INCLUDE" > <![%XML.performance;[ <!ELEMENT %n.performance; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.performance; %a.global; TEIform CDATA 'performance' > ]]>

< 109 New definitions for drama tag set 105 (cont'd) > =

<!ENTITY % XML.prologue "INCLUDE" > <![%XML.prologue;[ <!ELEMENT %n.prologue; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.prologue; %a.global; TEIform CDATA 'prologue' > ]]>  ]]>

8.5.7 Spoken-text tag set

< 110 Suppress definitions in spoken-text tag set > =

<!ENTITY % u 'IGNORE' >

The current definition is this:

<!ELEMENT %n.u;         - -  ((%phrase | %m.comp.spoken)+)      >

The new definitions are as follows:

< 111 New definitions for spoken-text tag set > =

<![%TEI.spoken;[ <!ENTITY % XML.u "INCLUDE" > <![%XML.u;[ <!ELEMENT %n.u; - - (#PCDATA | %m.phrase; | %m.comp.spoken; | %m.Incl;)* > <!ATTLIST %n.u; %a.global; %a.timed; %a.declaring; trans (smooth | latching | overlap | pause) smooth who IDREF %INHERITED; TEIform CDATA 'u' > ]]> ]]>

8.5.8 Dictionary tag set

We handle the dictionary tag set below, not here. (The list above does contain <oVar> and <pVar>, but that must be a mistake.)

8.5.9 Terminology tag set

< 112 Suppress definitions in terminology tag set > =

<!ENTITY % ofig 'IGNORE' > <!ENTITY % termEntry 'IGNORE' > <!ENTITY % tig 'IGNORE' >

The current definitions in the nested tag set are these:

<!ELEMENT %n.ofig;      - O  ((%m.terminologyMisc)*,
                             (%n.otherForm;, (%n.gram)*),
                             (%m.terminologyMisc)*)             >
<!ELEMENT %n.termEntry; - O  ((%m.terminologyMisc)*, (%n.tig)+)

                                                 +(%m.terminologyInclusions)
                                                                >
<!ELEMENT %n.tig;       - O  ((%m.terminologyMisc)*, (%n.term;,
                             (%n.gram)*),
                             (%m.terminologyMisc)*, (%n.ofig)*)
                                                                >

Note that <termEntry> has inclusions of its own. These do not require special treatment in our propagation of inclusions, since the set of legal descendants of <termEntry> is the same as the set of legal descendants of <text>. The set of terminology inclusions, however, does need to be revised for future versions of the DTD, since it's not disjoint from elements named in content models. It includes elements normally included in any phrase-level content model; we don't want to include them in m.Incl, since that would cause ambiguity. So all terminological content models should be rewritten for TEI P4, or even P3.5.

The new definitions are as follows:

< 113 New definitions for terminology tag set > =

<![%TEI.terminology;[ <!ENTITY % XML.ofig "INCLUDE" > <![%XML.ofig;[ <!ELEMENT %n.ofig; - O ((%m.terminologyMisc; | %m.Incl;)*, (%n.otherForm;, (%n.gram; | %m.Incl;)*), ((%m.terminologyMisc;), (%m.Incl;)*)*) > <!ATTLIST %n.ofig; %a.global; type CDATA #IMPLIED TEIform CDATA 'ofig' > ]]>

< 114 New definitions for terminology tag set 113 (cont'd) > =

<!ENTITY % XML.termEntry "INCLUDE" > <![%XML.termEntry;[ <!ELEMENT %n.termEntry; - O ((%m.terminologyMisc; | %m.terminologyInclusions; | %m.Incl;)*, (%n.tig;, (%m.Incl; | %m.terminologyInclusions;)*)+) > <!ATTLIST %n.termEntry; %a.global; type CDATA #IMPLIED TEIform CDATA 'termEntry' > ]]>

< 115 New definitions for terminology tag set 113 (cont'd) > =

<!ENTITY % XML.tig "INCLUDE" > <![%XML.tig;[ <!ELEMENT %n.tig; - O ((%m.terminologyMisc; | %m.terminologyInclusions; | %m.Incl;)*, (%n.term;, (%n.gram; | %m.terminologyInclusions; | %m.Incl;)*), ((%m.terminologyMisc;), (%m.terminologyInclusions; | %m.Incl;)*)*, (%n.ofig;, (%m.terminologyInclusions; | %m.Incl;)*)*) > <!ATTLIST %n.tig; %a.global; type CDATA #IMPLIED TEIform CDATA 'tig' > ]]> ]]>

In the flat version of the terminology tag set, there is no <ofig> and no <tig> element. The current definition of <termEntry> is this one:

<!ELEMENT %n.termEntry; - O  ( (%m.terminologyMisc |
                             %n.otherForm; | %n.gram; |
                             %m.terminologyInclusions)*,
                             (%n.term;, (%m.terminologyMisc |
                             %n.otherForm; | %n.gram; |
                             %m.terminologyInclusions)* )+ )    >

The new definition is as follows. Since we need both versions in the extensions file, we invent a new parameter entity (TEI.terminology.flat) to signal the difference between the nested and flat terminology element sets.

< 116 New definitions for flat terminology tag set > =

<![%TEI.terminology;[ <!ENTITY % TEI.terminology.flat 'IGNORE'> <![%TEI.terminology.flat;[ <!ENTITY % XML.termEntry "INCLUDE" > <![%XML.termEntry;[ <!ELEMENT %n.termEntry; - O ( (%m.terminologyMisc; | %n.otherForm; | %n.gram; | %m.terminologyInclusions; | %m.Incl;)*, (%n.term;, (%m.terminologyMisc; | %n.otherForm; | %n.gram; | %m.terminologyInclusions; | %m.Incl;)* )+ ) > <!ATTLIST %n.termEntry; %a.global; type CDATA #IMPLIED TEIform CDATA 'termEntry' > ]]> ]]> ]]>

8.5.10 Segmentation and alignment tag set

< 117 Suppress definitions in segmentation and alignment tag set > =

<!ENTITY % altGrp 'IGNORE' > <!ENTITY % joinGrp 'IGNORE' > <!ENTITY % linkGrp 'IGNORE' > <!ENTITY % timeline 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.altGrp;    - -  ((%n.alt; | %n.ptr; | %n.xptr;)*)  >
<!ELEMENT %n.joinGrp;   - -  ((%n.join; | %n.ptr; | %n.xptr;)*)
                                                                >
<!ELEMENT %n.linkGrp;   - -  (%n.link; | %n.ptr; | %n.xptr;)+   >
<!ELEMENT %n.timeline;  - -  ((%n.when;)+)                      >

The new definitions are as follows. We take the opportunity to level the declarations by using stars, instead of plus signs, on all of them. This has the drawback of allowing a link group to contain no links (only members of m.Incl), but the advantage of dramatically simplifying the content model.

< 118 New definitions for segmentation and alignment tag set > =

<![%TEI.linking;[ <!ENTITY % XML.altGrp "INCLUDE" > <![%XML.altGrp;[ <!ELEMENT %n.altGrp; - - ((%n.ptr; | %n.xptr; | %m.Incl;)*) > <!ATTLIST %n.altGrp; %a.global; %a.pointerGroup; mode (excl | incl) excl wScale (perc | real) perc TEIform CDATA 'altGrp' > ]]>

< 119 New definitions for segmentation and alignment tag set 118 (cont'd) > =

<!ENTITY % XML.joinGrp "INCLUDE" > <![%XML.joinGrp;[ <!ELEMENT %n.joinGrp; - - ((%n.ptr; | %n.xptr; | %m.Incl;)*) > <!ATTLIST %n.joinGrp; %a.global; %a.pointerGroup; result CDATA #IMPLIED desc CDATA #IMPLIED TEIform CDATA 'joinGrp' > ]]>

< 120 New definitions for segmentation and alignment tag set 118 (cont'd) > =

<!ENTITY % XML.linkGrp "INCLUDE" > <![%XML.linkGrp;[ <!ELEMENT %n.linkGrp; - - (%n.ptr; | %n.xptr; | %m.Incl;)* > <!ATTLIST %n.linkGrp; %a.global; %a.pointerGroup; TEIform CDATA 'linkGrp' > ]]>

< 121 New definitions for segmentation and alignment tag set 118 (cont'd) > =

<!ENTITY % XML.timeline "INCLUDE" > <![%XML.timeline;[ <!ELEMENT %n.timeline; - - ((%n.when;), (%m.Incl;)*)+ > <!ATTLIST %n.timeline; %a.global; origin IDREF #REQUIRED unit NMTOKEN #IMPLIED interval NUTOKEN #IMPLIED TEIform CDATA 'timeline' > ]]> ]]>

We have included m.Incl within these content models in the interests of consistency: this document is intended to provide an XML-compatible DTD which accepts all valid TEI P3 documents, and does not change the language unnecessarily. In the long run, however, it seems unlikely that we need to allow any m.Incl elements within any of these content models. Page breaks really and truly do not occur within link groups. Allowing timelines to nest within timelines is daft. And as we have seen, adding m.Incl to the original content models introduces ambiguity, since some members of that class were already named in the models. Removing the explicit mention avoids the ambigutity, but renders the content model misleading.

It is the editors' view that in P4, the m.Incl class should not appear in these models; they should revert to the form given in P3.

8.5.11 Analysis and interpretation tag set

< 122 Suppress definitions in analysis tag set > =

<!ENTITY % c 'IGNORE' > <!ENTITY % interpGrp 'IGNORE' > <!ENTITY % m 'IGNORE' > <!ENTITY % spanGrp 'IGNORE' > <!ENTITY % w 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.c;         - -  (#PCDATA)                          >
<!ELEMENT %n.interpGrp; - -  ((%n.interp;)*)                    >
<!ELEMENT %n.m;         - -  ((#PCDATA | %n.seg; | %n.c;)*)     >
<!ELEMENT %n.spanGrp;   - -  ((%n.span;)*)                      >
<!ELEMENT %n.w;         - -  ((#PCDATA | %n.seg; | %n.w; |
                             %n.m; | %n.c;)*)                   >

The new definitions are as follows:

< 123 New definitions for analysis tag set > =

<![%TEI.analysis;[ <!ENTITY % XML.c "INCLUDE" > <![%XML.c;[ <!ELEMENT %n.c; - - (#PCDATA) > <!ATTLIST %n.c; %a.global; %a.seg; TEIform CDATA 'c' > ]]>

Since <interp> is a member of class Incl, we cannot name it directly in the content model, on pain of ambiguity. (Sigh.)

< 124 New definitions for analysis tag set 123 (cont'd) > =

<!ENTITY % XML.interpGrp "INCLUDE" > <![%XML.interpGrp;[  <!ELEMENT %n.interpGrp; - - (%m.Incl;)* > <!ATTLIST %n.interpGrp; %a.global; %a.interpret; TEIform CDATA 'interpGrp' > ]]>

< 125 New definitions for analysis tag set 123 (cont'd) > =

<!ENTITY % XML.m "INCLUDE" > <![%XML.m;[ <!ELEMENT %n.m; - - (#PCDATA | %n.seg; | %n.c; | %m.Incl;)* > <!ATTLIST %n.m; %a.global; %a.seg; baseform CDATA #IMPLIED TEIform CDATA 'm' > ]]>

The <spanGrp> element, like <interpGrp>, becomes close to meaningless now, if one doesn't understand that it is supposed to contain spans, which are included in m.Incl.

< 126 New definitions for analysis tag set 123 (cont'd) > =

<!ENTITY % XML.spanGrp "INCLUDE" > <![%XML.spanGrp;[  <!ELEMENT %n.spanGrp; - - (%m.Incl;)* > <!ATTLIST %n.spanGrp; %a.global; %a.interpret; TEIform CDATA 'spanGrp' > ]]>

< 127 New definitions for analysis tag set 123 (cont'd) > =

<!ENTITY % XML.w "INCLUDE" > <![%XML.w;[ <!ELEMENT %n.w; - - (#PCDATA | %n.seg; | %n.w; | %n.m; | %n.c; | %m.Incl;)* > <!ATTLIST %n.w; %a.global; %a.seg; lemma CDATA #IMPLIED TEIform CDATA 'w' > ]]> ]]>

8.5.12 Feature structures tag set

The arguments given above against propagating global inclusions to the segmentation and alignment element types apply with equal or greater force to the feature-structures element types. But we resist the siren song of common sense and press on doggedly toward our goal of an upward-compatible experimental XML DTD.

< 128 Suppress definitions in feature-structures tag set > =

<!ENTITY % f 'IGNORE' > <!ENTITY % falt 'IGNORE' > <!ENTITY % flib 'IGNORE' > <!ENTITY % fs 'IGNORE' > <!ENTITY % fslib 'IGNORE' > <!ENTITY % fvlib 'IGNORE' > <!ENTITY % valt 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.f;         - O  (%n.null; | (%n.plus; | %n.minus;
                             | any | %n.none; | %n.dft; |
                             %n.uncertain; | %n.sym; | %n.nbr;
                             | %n.msr; | %n.rate; | %n.str; |
                             %n.vAlt; | %n.alt; | %n.fs;)*)     >
<!ELEMENT %n.fAlt;      - -  ((%n.f; | %n.fs; | %n.fAlt;),
                             (%n.f; | %n.fs; | %n.fAlt;)+)      >
<!ELEMENT %n.fLib;      - -  ((%n.f; | %n.fAlt;)*)              >
<!ELEMENT %n.fs;        - -  ((%n.f; | %n.fAlt; | %n.alt;)*)    >
<!ELEMENT %n.fsLib;     - -  ((%n.fs; | %n.vAlt;)*)             >
<!ELEMENT %n.fvLib;     - -  ((%n.plus; | %n.minus; | any |
                             %n.none; | %n.dft; | %n.uncertain;
                             | %n.null; | %n.sym; | %n.nbr; |
                             %n.msr; | %n.rate; | %n.str; |
                             %n.vAlt;)*)                        >
<!ELEMENT %n.vAlt;      - -  ((%n.plus; | %n.minus; | any |
                             %n.none; | %n.dft; | %n.uncertain;
                             | %n.null; | %n.sym; | %n.nbr; |
                             %n.msr; | %n.rate; | %n.str; |
                             %n.vAlt; | %n.fs;), (%n.plus; |
                             %n.minus; | any | %n.none; |
                             %n.dft; | %n.uncertain; | %n.null;
                             | %n.sym; | %n.nbr; | %n.msr; |
                             %n.rate; | %n.str; | %n.vAlt; |
                             %n.fs;)+)                          >

The new definitions are as follows:

< 130 New definitions for feature-structures tag set 129 (cont'd) > =

<!ENTITY % XML.fAlt "INCLUDE" > <![%XML.fAlt;[ <!ELEMENT %n.fAlt; - - ((%n.f; | %n.fs; | %n.fAlt;), (%n.f; | %n.fs; | %n.fAlt;)+) > <!ATTLIST %n.fAlt; %a.global; mutExcl (Y | N) #IMPLIED TEIform CDATA 'fAlt' > ]]>

< 131 New definitions for feature-structures tag set 129 (cont'd) > =

<!ENTITY % XML.fLib "INCLUDE" > <![%XML.fLib;[ <!ELEMENT %n.fLib; - - ((%n.f; | %n.fAlt;)*) > <!ATTLIST %n.fLib; %a.global; type CDATA #IMPLIED TEIform CDATA 'fLib' > ]]>

< 132 New definitions for feature-structures tag set 129 (cont'd) > =

<!ENTITY % XML.fs "INCLUDE" > <![%XML.fs;[ <!ELEMENT %n.fs; - - ((%n.f; | %n.fAlt; | %n.alt;)*) > <!ATTLIST %n.fs; %a.global; type CDATA #IMPLIED feats IDREFS #IMPLIED rel (eq | ne | sb | ns) sb TEIform CDATA 'fs' > ]]>

< 133 New definitions for feature-structures tag set 129 (cont'd) > =

<!ENTITY % XML.fsLib "INCLUDE" > <![%XML.fsLib;[ <!ELEMENT %n.fsLib; - - ((%n.fs; | %n.vAlt;)*) > <!ATTLIST %n.fsLib; %a.global; type CDATA #IMPLIED TEIform CDATA 'fsLib' > ]]>

It will be noted that the new versions are identical to the old versions. Common sense has won out, and in this experimental XML version of the TEI DTD, global inclusions are not propagated into these feature-structure element types.

8.5.13 Names and dates tag set

The <dateStruct> and <timeStruct> element types have already been rewritten above.

8.5.14 Text-criticism tag set

< 136 Suppress definitions in text-criticism tag set > =

<!ENTITY % app 'IGNORE' > <!ENTITY % rdgGrp 'IGNORE' > <!ENTITY % witList 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.app;       - O  ((%n.lem)?, ((%n.rdg;, (%n.wit)?)
                             | (%n.rdgGrp;, (%n.wit)?))+)       >
<!ELEMENT %n.rdgGrp;    - O  (%n.rdgGrp; | (%n.rdg;,
                             (%n.wit)?))+                       >
<!ELEMENT %n.witList;   - O  ((%n.witness)+)                    >

The new definitions are as follows. We take the opportunity to address one of Peter Robinson's long-standing concerns, and allow witnesses to the lemma to be listed. Note that the model for <rdgGrp> seems bizarre. Why are readings and reading groups treated similarly in <app> entries and not in <rdgGrp> elements?

< 137 New definitions for text-criticism tag set > =

<![%TEI.textcrit;[ <!ENTITY % XML.app "INCLUDE" > <![%XML.app;[ <!ELEMENT %n.app; - O ( (%m.Incl;)*, (%n.lem;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?)?, ( (%n.rdg;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?) | (%n.rdgGrp;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?) )+ ) > <!ATTLIST %n.app; %a.global; type CDATA #IMPLIED from IDREF #IMPLIED to IDREF #IMPLIED loc CDATA #IMPLIED TEIform CDATA 'app' > ]]>

< 138 New definitions for text-criticism tag set 137 (cont'd) > =

<!ENTITY % XML.rdgGrp "INCLUDE" > <![%XML.rdgGrp;[ <!ELEMENT %n.rdgGrp; - O ((%m.Incl;)*, (((%n.rdgGrp;, (%m.Incl;)*) | (%n.rdg;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?)))+) > <!ATTLIST %n.rdgGrp; %a.global; %a.readings; TEIform CDATA 'rdgGrp' > ]]>

< 139 New definitions for text-criticism tag set 137 (cont'd) > =

<!ENTITY % XML.witList "INCLUDE" > <![%XML.witList;[ <!ELEMENT %n.witList; - O ((%m.Incl;)*, (%n.witness;, (%m.Incl;)*)+) > <!ATTLIST %n.witList; %a.global; TEIform CDATA 'witList' > ]]> ]]>

8.5.15 Graphs and digraphs tag set

< 140 Suppress definitions in graphs tag set > =

<!ENTITY % eTree 'IGNORE' > <!ENTITY % forest 'IGNORE' > <!ENTITY % forestGrp 'IGNORE' > <!ENTITY % graph 'IGNORE' > <!ENTITY % tree 'IGNORE' > <!ENTITY % triangle 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.graph;     - -  ((%n.node;)+ & (%n.arc;)*)         >
<!ELEMENT %n.tree;      - -  ((%n.leaf; | %n.iNode;)*,
                             %n.root;, (%n.leaf; | %n.iNode;)*)
                                                                >
<!ELEMENT %n.eTree;     - -  ((%n.eTree; | %n.triangle; |
                             %n.eLeaf; )*)                      >
<!ELEMENT %n.triangle;  - -  ((%n.eTree; | %n.triangle; |
                             %n.eLeaf;)*)                       >
<!ELEMENT %n.forest;    - -  ((%n.tree; | %n.eTree; |
                             %n.triangle;)+)                    >
<!ELEMENT %n.forestGrp; - -  ((%n.forest;)+)                    >

The new definitions are as follows:

< 141 New definitions for graphs tag set > =

<![%TEI.nets;[ <!ENTITY % XML.tree "INCLUDE" > <![%XML.tree;[ <!ELEMENT %n.tree; - - ((%n.leaf; | %n.iNode; | %m.Incl;)*, %n.root;, (%n.leaf; | %n.iNode; | %m.Incl;)*) > <!ATTLIST %n.tree; %a.global; label CDATA #IMPLIED arity NUMBER #IMPLIED ord (Y | N | partial) Y order NUMBER #IMPLIED TEIform CDATA 'tree' > ]]>

< 142 New definitions for graphs tag set 141 (cont'd) > =

<!ENTITY % XML.eTree "INCLUDE" > <![%XML.eTree;[ <!ELEMENT %n.eTree; - - ((%n.eTree; | %n.triangle; | %n.eLeaf; | %m.Incl;)*) > <!ATTLIST %n.eTree; %a.global; label CDATA #IMPLIED value IDREF #IMPLIED TEIform CDATA 'eTree' > ]]>

< 143 New definitions for graphs tag set 141 (cont'd) > =

<!ENTITY % XML.triangle "INCLUDE" > <![%XML.triangle;[ <!ELEMENT %n.triangle; - - ((%n.eTree; | %n.triangle; | %n.eLeaf; | %m.Incl;)*) > <!ATTLIST %n.triangle; %a.global; label CDATA #IMPLIED value IDREF #IMPLIED TEIform CDATA 'triangle' > ]]>

< 144 New definitions for graphs tag set 141 (cont'd) > =

<!ENTITY % XML.forest "INCLUDE" > <![%XML.forest;[ <!ELEMENT %n.forest; - - ((%n.tree; | %n.eTree; | %n.triangle; | %m.Incl;)*) > <!ATTLIST %n.forest; %a.global; type CDATA #IMPLIED TEIform CDATA 'forest' > ]]>

< 145 New definitions for graphs tag set 141 (cont'd) > =

<!ENTITY % XML.forestGrp "INCLUDE" > <![%XML.forestGrp;[ <!ELEMENT %n.forestGrp; - - ((%n.forest;, (%m.Incl;)*)+) > <!ATTLIST %n.forestGrp; %a.global; type CDATA #IMPLIED TEIform CDATA 'forestGrp' > ]]> ]]>

8.5.16 Tables tag set

< 146 Suppress definitions in tables tag set > =

<!ENTITY % figure 'IGNORE' > <!ENTITY % formula 'IGNORE' > <!ENTITY % row 'IGNORE' > <!ENTITY % table 'IGNORE' >

The current definitions are these:

<!ELEMENT %n.table;     - -  ((%n.head)*, (%n.row)+)            >
<!ELEMENT %n.row;       - O  ((%n.cell; | %n.table;)+)          >
<!ELEMENT %n.figure;    - -  ((%n.head)?, (%n.p)*,
                             (%n.figDesc)?, (%n.text)?)         >
<!ELEMENT %n.formula;   - -  %formulaContent;                   >

The new definitions are as follows:

< 147 New definitions for tables tag set > =

<![%TEI.figures;[ <!ENTITY % XML.table "INCLUDE" > <![%XML.table;[ <!ELEMENT %n.table; - - ((%n.head; | %m.Incl;)*, (%n.row;, (%m.Incl;)*)+) > <!ATTLIST %n.table; %a.global; rows NUMBER #IMPLIED cols NUMBER #IMPLIED TEIform CDATA 'table' > ]]>

< 148 New definitions for tables tag set 147 (cont'd) > =

<!ENTITY % XML.row "INCLUDE" > <![%XML.row;[ <!ELEMENT %n.row; - O ((%n.cell; | %n.table;), (%m.Incl;)*)+ > <!ATTLIST %n.row; %a.global; role CDATA data TEIform CDATA 'row' > ]]>

< 149 New definitions for tables tag set 147 (cont'd) > =

<!ENTITY % XML.figure "INCLUDE" > <![%XML.figure;[ <!ELEMENT %n.figure; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.p;, (%m.Incl;)*)*, (%n.figDesc;, (%m.Incl;)*)?, (%n.text;, (%m.Incl;)*)?) > <!ATTLIST %n.figure; %a.global; entity ENTITY #IMPLIED TEIform CDATA 'figure' > ]]>

< 150 New definitions for tables tag set 147 (cont'd) > =

<!ENTITY % XML.formula "INCLUDE" > <![%XML.formula;[ <!ELEMENT %n.formula; - - %formulaContent; > <!ATTLIST %n.formula; %a.global; notation %formulaNotations; #REQUIRED TEIform CDATA 'formula' > ]]> ]]>

9 The problem of the dictionary chapter

The TEI base tag set for dictionaries cannot be made XML conformant using the methods described here. That tag set distinguishes two top-level elements for dictionary entries: <entry>, which has a relatively well-defined structure, and <entryFree>, which has no prescribed structure at all: any element used in tagging dictionary entries may appear, within any other element, at any level of nesting. The desired freedom for <entryFree> entries is guaranteed by the inclusion exception on <entryFree>. The standard declaration for the element is this:

<!ELEMENT %n.entryFree; - O  (#PCDATA)
                                                 +(%m.dictionaryParts
                                                 | %m.phrase |
                                                 %m.inter)      >

If we use the techniques described above, all of the members of the classes dictionaryParts, phrase, and inter will be made legal at every point within any members of any of those classes. Apart from the havoc that would wreak on the core tag set, it would wholly erase the distinction between <entry> and <entryFree> elements.

So some other method of handling anomalous dictionary entries is needed in an XML version of the TEI DTD. Borrowing ideas from B. Tommie Usdin and Deborah A. Lapeyre, and with thanks also to David J. Birnbaum, I propose a new approach to the problem.

The basic idea is to define an element for anomalous structures in dictionary entries. In this discussion, I'll assume this element is called <dictAnomaly> for (`dictionary anomaly'). For every element in the normal structure of a dictionary, the existing content model is changed by taking the existing content model and adding <dictAnomaly> as an alternative. Thus the element <superentry> currently has the following declaration:

<!ELEMENT %n.superentry;
                        - O  ((%n.form)?, (%n.entry)+)          >

After the change, it will have the declaration:

<!ELEMENT %n.superentry;
                        - O  (((%n.form;)?, (%n.entry;)+)
                             | %n.dictAnomaly;)                 >

That is, a superentry is either normal (an optional <form> element followed by one or more <entry> elements), or else it is anomalous. The <dictAnomaly> element itself is defined as allowing any sequence of character data, dictionary elements, inter-level elements, or phrase-level elements:

<!ELEMENT %n.anomaly;   - O  (#PCDATA | %m.dictionaryParts;
                             | %m.phrase; | %m.inter;)*         >

An anomalous superentry contains a single <dictAnomaly> element, and nothing else.

For elements which are currently defined with mixed content, <dictAnomaly> is simply added to the list of elements which can occur within them. This allows us to evade the mixed-content problem. The simplest way to do this is to define <dictAnomaly> as a phrase-level element in the dictionary tag set. It also allows anomalies to occur within generic phrase-level and inter-level elements which are used in dictionary entries.

In principle, the extensions file should handle this thus:

<!ENTITY % x.phrase 'dictAnomaly |' >

But since we have to include new declarations for the entire phrase-level class system in the extensions file anyway (to fix the problems with phrase.seq), we can simply add <dictAnomaly> to phrase, as was done above.

10 Open questions and checklists

This list brings together in one place a number of open questions mentioned above.

Should entities for omissibility indications be introduced into the TEI Odd files? Or should they be introduced only in the DTD output from odddtd? (Current leaning: only in the odddtd output: entification would made the DTD fragments in the Guidelines too hard to read.)
Should <graph> be defined as proposed here, or more loosely?
Should the failure to parameterize exclusion exceptions be regarded as a corrigible error? (N.B. parameterizing them will require the creation of new entDoc elements for each of them.)
How many of the current class of global inclusions should actually be globally legal? Particularly to be considered here are the elements now defined as taking only #PCDATA.
What should we do in the short term (experimental XML version of the DTD) about specialPara?
What should we do in the long term (TEI P3.5 and P4) about specialPara?

Corrigible errors identified in this document are:

absence of semicolons in parameter-entity references
use of ampersand connectors in four content models
use of #PCDATA not as prescribed in XML 1.0
excess parentheses in definition of phrase

11 Miscellaneous Housekeeping

A few scraps necessary for housekeeping have no obvious home in this document; I'll put them here.

Before we define component, we need to embed all the entity files for the selected tag sets:

< 151 Embed tag-set-specific ent files > =

 <![ %TEI.verse; [ <!ENTITY % TEI.verse.ent system 'teivers2.ent' > %TEI.verse.ent; ]]> <![ %TEI.drama; [ <!ENTITY % TEI.drama.ent system 'teidram2.ent' > %TEI.drama.ent; ]]> <![ %TEI.spoken; [ <!ENTITY % TEI.spoken.ent system 'teispok2.ent' > %TEI.spoken.ent; ]]> <![ %TEI.dictionaries; [ <!ENTITY % TEI.dictionaries.ent system 'teidict2.ent' > %TEI.dictionaries.ent; ]]> <![ %TEI.terminology; [ <!ENTITY % x.common '' > <!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | %m.hqinter; | %m.lists; | %m.notes; | %n.stage;' > <!ENTITY % TEI.terminology.ent system 'teiterm2.ent' > %TEI.terminology.ent; ]]> <![ %TEI.linking; [ <!ENTITY % TEI.linking.ent system 'teilink2.ent' > %TEI.linking.ent; ]]> <![ %TEI.analysis; [ <!ENTITY % TEI.analysis.ent system 'teiana2.ent' > %TEI.analysis.ent; ]]> <![ %TEI.transcr; [ <!ENTITY % TEI.transcr.ent system 'teitran2.ent' > %TEI.transcr.ent; ]]> <![ %TEI.textcrit; [ <!ENTITY % TEI.textcrit.ent system 'teitc2.ent' > %TEI.textcrit.ent; ]]> <![ %TEI.names.dates; [ <!ENTITY % TEI.names.dates.ent system 'teind2.ent' > %TEI.names.dates.ent; ]]> <![ %TEI.figures; [ <!ENTITY % TEI.figures.ent system 'teifig2.ent' > %TEI.figures.ent; ]]>

Note that the terminology entity file unwisely refers to common, which we thus must define in an ad hoc way.

Before we do that, we have to provide default values for all the tagset entities:

< 152 Provide default tagset declarations > =

<!ENTITY % TEI.prose 'IGNORE' > <!ENTITY % TEI.verse 'IGNORE' > <!ENTITY % TEI.drama 'IGNORE' > <!ENTITY % TEI.spoken 'IGNORE' > <!ENTITY % TEI.dictionaries 'IGNORE' > <!ENTITY % TEI.terminology 'IGNORE' > <!ENTITY % TEI.general 'IGNORE' > <!ENTITY % TEI.mixed 'IGNORE' > <!ENTITY % TEI.linking 'IGNORE' > <!ENTITY % TEI.analysis 'IGNORE' > <!ENTITY % TEI.fs 'IGNORE' > <!ENTITY % TEI.certainty 'IGNORE' > <!ENTITY % TEI.transcr 'IGNORE' > <!ENTITY % TEI.textcrit 'IGNORE' > <!ENTITY % TEI.names.dates 'IGNORE' > <!ENTITY % TEI.nets 'IGNORE' > <!ENTITY % TEI.figures 'IGNORE' > <!ENTITY % TEI.corpus 'IGNORE' >

And we need to define the TEI keywords and default generic identifiers:

< 153 Define TEI keywords > =

<!ENTITY % INHERITED '#IMPLIED' > <!ENTITY % ISO-date 'CDATA' > <!ENTITY % extPtr 'CDATA' > <!ENTITY % TEI.elementNames system 'teigis2.ent' > %TEI.elementNames;

A Notation

The notation in this paper is fairly simple:

E, E' (E-prime), F, G are regular expressions. For purposes of this discussion, they are also content-model groups.
L(E) is the language accepted by E
Sigma is the alphabet (set) of atomic symbols used in the expressions E, etc.
Sigma* is any string of symbols in Sigma, including the empty string
I is the set of symbols named in the relevant (active) inclusion exceptions; in the context of a regular expression E', I should be taken to stand for an alternation of all the symbols i in the set I. In an actual content model, the expression written here as I* will normally be written %Istar; or (%m.I;)*, where the parameter entities are declared along these lines:
```
<!ENTITY % x.I ''>
<!ENTITY % m.I '%x.I; %m.globincl;'>
<!ENTITY % Istar '(%m.I;)*'>
```
i is an arbitrary symbol in the set I
x, y are strings of atomic symbols (members of Sigma*)
xy is the concatenation of x and y.

Notes

[1] In particular, this document does not suppress the tag-omissibility indicators in the TEI DTD; that job is left to special-purpose software. In its current form, this document also does not completely normalize all mixed content models to the form required by XML. I started to make it do so, and have just realized that carthage may already do what is necessary. I need to find out for sure whether carthage does the job, and either complete or remove the partial sets of changes described for the mass redeclaration of all phrase.seq and paraContent elements.
[return to text]

[2] If the set of inclusions and the set of exclusions on the exception stack are always the same for every possible occurrence of every element type in the DTD, then an exception-free DTD can be created which accepts exactly the same set of documents as the original DTD. A DTD which had exceptions only on the root element type, for example, could be replicated without changing the language it accepts. I am not aware of any production DTDs which fall into this class.
[return to text]

[3] One could take the converse goal of ensuring that the revised DTD be at least as selective as the original DTD, i.e. that it undergenerate with respect to the original language. This would be interesting as an exercise, but if applied to the TEI DTD it would invalidate existing TEI data, which makes it unacceptable as an approach to creating an XML-conformant version of the TEI DTD.
[return to text]

[4] This is clearly established by Wood and Kilpeläinen, though they inexplicably claim to have proven the opposite.
[return to text]

[5] Strictly speaking, these ought perhaps to be imf(E,I), mf(E,I), and m(E,I), but for purposes of this paper we will never need different sets of inclusions I. So if it matters, we can define imf(E) formally as imf(E,I), etc.
[return to text]

[6] What is wrong with these lists, and why are they not complete? The Names and Dates tag set may not have been selected, or the DTD I used may -- almost surely did -- have the bug that makes much of that tag set unreachable. The Corpus tags are for the header, and may in fact not be descendants of <text>.
[return to text]

[7] The dictionary tag set includes orth, pron, hyph, syll, stress, gram, gen, number, case, per, tns, mood, itype, pos, subc, colloc, def, tr, lang, usg, lbl.
[return to text]

[8] This is a classic example of what is known in DTD design circles as the Mixed-Content Gotcha; the problems associated with it led the XML design group to restrict the form of mixed-content models in order to forbid content models which are subject to the problem. This restriction, in turn, makes it essential to revise specialPara in an XML version of the TEI DTD.
[return to text]

[9] An inquiry on TEI-L might usefully reveal whether anyone is actually using <set> and whether they would be inconvenienced by this tighter model.
[return to text]

HTML generated 7 Jul 1999

Construction of an XML Version of the TEI DTD

Table of Contents

Notes