Naming Conventions in the TEI scheme
Contents
Naming Conventions in the TEI schemeFollowing the decision to implement support for XML namespaces in TEI P5, we have an opportunity to review naming conventions and practice at all levels of the TEI scheme. This document sketches out the decisions taken, or proposed, so far, and will be revised as necessary.
The namespace
To begin at the beginning, we have the namespace itself. It is currently proposed that there should be only a single TEI namespace (rather than, say, a different one for each base, or for each module). There was a little discussion on TEI-L ( http://listserv.brown.edu/archives/cgi-bin/wa?A1=ind0312&L=tei-l&m=4716#22 ) of what form of identifier that namespace should take, with a consensus emerging in favour of a URI-style identifier including a version number. The current proposal is therefore that the name space for TEI P5 will be http://www.tei-c.org/ns/1.0, and that is the string which our current P5 processing utilities expect to find as the value of an xmlns attribute on the root of a TEI document.
Note that the version number in the namespace is not the version number of the TEI Guidelines, but the version of the name space declaration. We therefore propose additionally a new attribute on <TEI> to identify the release of the Guidelines. We have long been unable to simply identify fixed or enhanced versions of the DTDs and schemas, and this will be an important new tool. We propose the name teiversion, with a simple floating-point number as the datatype. It has also been proposed that the version number should be derived from the date of release, but on balance we prefer to simply increment the number.
A TEI document using this namespace and the first release of TEI P5 will thus commence <TEI xmlns="http://www.tei-c.org/ns/1.0" teiversion="5.0">
Elements and classes
We have already made the decision to rename the root element of a TEI document to TEI (or teiCorpus), without the .2 suffix which has puzzled users ever since P3. It may be worth noting that these two element names will remain the only ones to flout the general principles underlying the naming of TEI elements, as stated in TEI document ML W26, which concludes as follows: ‘The following recommendations should be applied where possible in generating names for TEI DTDs, and should govern usage in TEI documents and examples:
- TEI documents and examples should give all one-word tag and attribute names in lowercase; phrasal names should uppercase the initial character of each word but the first.
- Names should be natural-language words or phrases.
- Avoid abbreviation except for very common items.
- Where possible, avoid forming names from phrases.
- Avoid collisions among names of different types.
- Use nouns and adjectives for tag and attribute names; avoid verbs.
These principles still seem valid, and are more or less consistently applied across the Guidelines. However, they do not cover all the conventions which subsequently developed during the authoring and maintenance of the Guidelines, with varying degrees of consistency. This document proposes a new list of specific naming conventions for all components of the TEI scheme.
- elements and attributes e.g. <encodingDesc> , id
- names of elements e.g. n.encodingDesc
- model classes e.g. m.bibl
- attribute classes e.g. a.global
- extension classes e.g. x.bibl
- common content models e.g. phrase.seq
In P5, with the use of RelaxNG as underlying formalism, it becomes much simpler to express the notion of class membership. Moreover the mechanism of RelaxNG patterns is a much better fit with the requirements of the TEI abstract model than the rather idiosyncratic use of SGML parameter entities in P4. In P5, indirection is provided by the use of patterns and the class system is directly supported.
In P5, for every defined element blort, there is a pattern with the same name which carries the definition. Another pattern, the name of which is content.blort, defines its content model by reference to macros, element classes, or other element patterns (though the latter is deprecated). This indirection makes renaming of elements easy.
Our intention is that every element should be a member of at least one (possibly several) model classes, and that its content should generally be defined in terms of element classes rather than specific elements. A knowledge of element classes thus become rather more important than it was in P4, where the class system was really only of relevance when defining user extensions. In P5, the class system is also of importance for interoperability with other XML vocabularies and namespaces.
- an unadorned name (e.g. <encodingDesc> , id) is an element or an attribute, or a pattern defining an element
- a name starting content. (e.g. content.encodingDesc) is a pattern defining the content model of the named element.
- a name starting class. (e.g. class.biblPart, class.phrase) is a model class, members of which have common hierarchic or semantic properties, for use in content models
- a name starting attributes.class (e.g. attributes.class.linking, attributes.class.global) is an attribute class, defining the set of attributes shared by all members of the class.
- a name starting macro. (e.g. macro.phraseSeq) is a pattern defining some common content models
Architectural components
At P4, the TEI scheme is organized as a number of tagsets which are characterized as being core, base, additional, or auxiliary. To build a view of the TEI dtd, you take one base, the core tagsets, and zero or more additional tagsets. Auxiliary tagsets are free standing DTDs, which re-use some of the other tagsets, but cannot be combined in the same way.
Each tagset is actually generated from the ODD files by processing a number of <dtdFrag> elements in a rather complicated way. Attributes on these elements control in a somewhat non-obvious way how these fragments of schema are to be assembled and cross-referenced.
At P5, we have dropped the misnomer ‘tagset’ in favour of the neutral term module. With the advent of support for multiple namespaces, there is no need for auxiliary modules, and so we propose to drop the term from P5 (and in any case, of the existing five auxiliary tagsets, only one — the FSD — seems to have any future). For the moment we are retaining the distinction between core, base, and additional modules, but we are seriously considering the necessity for the distinction between base and additional modules. It seems likely that further simplification will become possible during the process of revising the chapter on basic TEI structure.
The way in which modules are generated from ODDs has been completely changed as a consequence of which there are several naming changes. However, as the original ODD format ( http://www.tei-c.org/Vault/ED/edw29.sgm ) was never formally published or approved by the TEI Council, it is probably unnecessary to describe these changes in full detail. A brief summary is given in the next section: full details are provided in the ChangeLog.
Simplifying and renaming the ODDs
The original ODD format defined large numbers of different documentation crystals for different kinds of object: these included <tagDoc> s, which document elements and attributes; <peDoc> , which document parameter entities; <entDoc> , which document entities; and <classDoc> , which documented classes. Corresponding with each of these, the ODD system allowed for range of declaration elements, ( <tagDecl> , <entDecl> , <claDecl> , etc), which produced the formal declaration for the object concerned; and a number of description elements ( <tagDesc> , etc.) which generated the prose documenation for it, and so on.
- <tagDecl> and <patternDecl> are removed, and replaced with the corresponding contents of <tagDoc> and <patternDoc> (as entities). The level of indirection provided served no useful purpose, and placing the <tagDoc> and <patternDoc> inline makes for ODD processing which is easier to understand. <claDecl> is replaced by an inline <classDoc> only at its first occurrence for a given class. More work may be needed here.
- <dataDesc> demoted to a comment (should be scanned and interesting parts put into <remarks> )
- change <gi> , <attName> , <entName> , <class> to <ident> , for consistency.
- change <name> to <gloss> in <entDoc> , <classDoc> , <tagDoc> , <attDef>
- wrapper of <schemaContent> around content RelaxNG in tagdocs and classdocs
- change name of <dtdRef> and <dtdFrag> to <chunkRef> and <chunk>
- disallow <rs> as alternative to <gloss> in tagdocs
- <entDoc> renamed to <patternDoc>
- in <valList> , change from <val> <desc> pair to <valitem> , containing a standard <ident> , <equiv> and <gloss> trio
- get rid of @contin in <dtdFrag> (for each dtdFrag, find people who have it as @contin, and make a <dtdRef> to them)
- take each <eg> and put the contents into an <xmleg> element if it looks like being well-formed. <xmleg> ; contents are in namespace http://www.tei-c.org/P5/Examples/. This has the advantage that examples are parsed and can be validated.
- get rid of <part> in <classDoc> as it is not used and contents are wrong
- get rid of <string> elements containing IGNORE, INCLUDE and CDATA in favour of RelaxNG elements
- add new type for patternDoc "epe" for when it is providing attributes
- redo the <string> elements which have public IDs in, replace with <publicID file="..."...>
introduce the <module> element (and <moduleref> ) as a replacement for <chunk file=""> , <dtdFrag> , <peRef> , etc.
introduce the <patternDoc> element to replace those <entDoc> elements which define patterns, and removed those which defined the system entities used for TEI modules (this involved moving information about public identifiers from the <entDoc> to the <module> element; it also means that <module> becomes the only place to reference a filename).
- component to macro.component
- component.plus to macro.componentPlus
- component.seq to macro.componentSeq
- paraContent to macro.paraContent
- phrase.seq to macro.phraseSeq
- phrase to macro.phrasegroup
- seq to macro.seq
- specialPara to macro.specialPara
rename all <classDoc> elements of type "model" to include the "class." prefix within their <ident> element; similarly, for all <classDoc> elements of type A, thus renaming "a." to "attributes.class.", so "a.global" is now "attributes.class.global"