Rules for Use of TEI Lite in CIC E-Text Projects


C. M. Sperberg-McQueen

Mark Olsen

John Price-Wilkin

Perry Willett

For the CIC Working Group on Electronic Texts

31 October 1996

This unpublished document is distributed privately for comment by friends and colleagues; it is not now a formal publication and should not be quoted in published material.

Table of Contents

Status of This Document

This document is a draft specification of the encoding scheme to be used in electronic texts created by cooperative projects of the Committee on Institutional Cooperation (CIC). It has not yet been approved by the CIC or anyone else. It has been drafted by the authors named on the title page for consideration by the appropriate bodies, and should not be taken as a final product until those bodies have revised and approved it, and this note is removed.

On some topics, specific proposals are made in this document. Like the document as a whole, these proposals have not yet been approved formally and are subject to discussion and change. They are not final. Other topics are identified explicitly as open issues, on which discussion and decisions are needed; no proposals are made for dealing with open issues. A list of open issues appears in the appendix for convenient consultation.

The current partial draft of this document was prepared by C. M. Sperberg-McQueen on the basis of discussions with the other named authors. In completing and finishing the document, the following steps are expected:

The following things remain to be done before this document is approved by the CIC working group on e-texts:


1 Introduction

[Discuss background: CIC American Corpus project, CIC, TEI, audience to be served by the e-text collection, expected work processes...]

The CICTEI encoding scheme is based on the TEI Lite subset of TEI, documented in Lou Burnard and C. M. Sperberg-McQueen, "TEI Lite: An Introduction to Text Encoding for Interchange" (document TEI U5), which can be found at http://www-tei.uic.edu/orgs/tei/intros/teiu5.tei or http://www-tei.uic.edu/orgs/tei/intros/teiu5.html. Familiarity with the basics of SGML encoding and the TEI encoding scheme are assumed; readers who wish to learn the basics are referred to the TEI home page, especially to the list of tutorials and introductions.

2 Tag Sets

The following TEI tag sets are selected:

Full documentation of these tag sets may be found in the TEI guidelines: Association for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL), and Association for Literary and Linguistic Computing (ALLC). Guidelines for Electronic Text Encoding and Interchange, ed. C. M. Sperberg-McQueen and Lou Burnard. Chicago, Oxford: Text Encoding Initiative, 1994.

3 Encoding Practice

This section describes CIC encoding practice in particular areas. In many areas, several different levels of practice are defined; unless otherwise stated, all electronic texts created by CIC projects will adhere at least to the lowest defined level. In some cases, different minimum standards may apply to electronic texts acquired by the CIC from other sources and edited into SGML form. Such differences are always noted; if nothing is said, no distinction is made between work created by the CIC and work acquired from other sources and mounted by the CIC on CIC-wide servers. Documents acquired from other sources in SGML form may or may not be modified to meet minimal CIC standards; if they are not so modified, they may be described as being at `encoding level 0'.

No global encoding levels are defined; a text may be at level 1 with respect to notes, and level 4 with respect to quotations. A full description of the encoding of a given text thus requires that its level be specified for each area defined here.

If documents need to be characterized in terms of single numbers, then the following overall characterizations may be used:

If the text is at different levels in different respects, its overall level is the same as its lowest score in any individual aspect: a text which is at level 1 with respect to any area will be classified as level 1, etc.

In general, higher levels either are more exhaustive in identifying occurrences of specific textual features (e.g. quotation), or more complete in describing them, or both. The definition of the different levels thus always specifies both the recognition criteria for the element in question and the analytic detail to be supplied at each level.

The CICTEI encoding scheme requires that the encoding practice of each electronic text be documented in its TEI header. This document describes CIC encoding practice in prose, and also gives examples of elements in the TEI header which are to be used to document CIC practice. Since the encoding practice will be consistent for most texts produced by the CIC, the <encodingDesc> elements of most CIC texts can be substantially the same; the same boilerplate text can be used for all texts at a particular level. This document defines SGML entities for use in headers of CIC documents, which will make it easier to create the TEI headers. See examples below.

This document describes two versions of the CICTEI DTD: a `level-1' version which includes only the elements required by level-1 tagging, and a full version which includes all the elements selected. The level-1 DTD is intended only for training purposes.

3.1 TEI Header

In CIC electronic documents, the TEI header will have somewhat less flexibility than in the full TEI scheme. The primary goal is to enforce a higher minimum standard of documentation, and to make it easier for unintelligent software to identify an author, title, date, and source edition for any text. Headers which conform to the CICTEI specification will always also automatically conform to the TEI Lite specification and the base TEI specification.

It would also be possible to rewrite the header and provide new tags with unique names for what we regard as the critical bits of information. This is not done here because it seems unnecessary to depart from the basic TEI DTD in this way.

At the top level, all four parts of the header will be required, rather than optional. (Since much of the content will be constant across CIC documents, this will not impose as much burden on encoders as might be supposed.) Formally:

< 1 CIC TEI Header > =

 
<!ELEMENT teiHeader  - - (fileDesc, encodingDesc+, profileDesc+,
                          revisionDesc) > 
<!ATTLIST teiHeader          %a.global;
          type               CDATA               text
          creator            CDATA               #IMPLIED
          status             (new | update)      new
          date.created       CDATA               #IMPLIED
          date.updated       CDATA               #IMPLIED
          TEIform            CDATA               'teiHeader'    >

The <encodingDesc> and <profileDesc> elements will repeat only in the case of corpora and collections, and probably not often then; in normal cases, only one of each will appear.

3.1.1 The File Description

The title statement, extent statement, publication statement, and source description are all required. For normal texts, a single source edition must be identified. CICTEI documents will not reflect collation of multiple sources; at least, not without revision of this document. An edition statement and series statement will be supplied whenever appropriate (reissues or revisions of CIC texts, CIC texts created as part of a series). A notes statement will be provided only in exceptional cases.

Within the title statement, CIC texts will invariably give at least one title and at least one author (unknown and anonymous authors will be given as "Unknown" and "Anonymous", respectively). Where multiple titles are given, the first <title> element will be suitable for use by software as a short title or main title for the work. Other statements of responsibility, for editors, for sponsors, funders, and principal investigators of the various projects which create CIC texts, and for other miscellaneous forms of responsibility, will be given in a fixed sequence. [Is this necessary?]

Edition statements will use the <edition> and <respStmt> elements, not the unstructured series of <p> elements; series statements will similarly use the structured, not the unstructured, encoding defined in TEI Lite.

Publication statements will give the publishing information in a rigid sequence; the place of publication, terms of availability, and the date of publication will always be given. CIC texts will normally be given identification numbers by the CIC, and the DTD requires at least one (if none applies, it should be left blank).

The source description will normally take the form of a <biblFull> element; in the case of texts created in electronic form and therefore without a non-electronic source, the source description will take the following form:

 
<sourceDesc>
<p>Created in electronic form.</p>
</sourceDesc>
A standard CIC entity may be declared, to allow this to be abbreviated:
 
<sourceDesc>
&newtext;
</sourceDesc>

Formally:

< 2 Header Contents > =

 
<!ELEMENT fileDesc         - -  (titleStmt, editionStmt?, extent, 
                                publicationStmt, seriesStmt?, 
                                notesStmt?, sourceDesc) >
<!ATTLIST fileDesc           %a.global;
          TEIform            CDATA               'fileDesc'     >
<!ELEMENT titleStmt        - O  (title+, author+, 
                                (editor | sponsor | funder 
                                | principal | respStmt)*) >
<!ATTLIST titleStmt          %a.global
          TEIform            CDATA               'titleStmt'    >
<!ELEMENT editionStmt      - O  (edition, respStmt*) >
<!ATTLIST editionStmt          %a.global
          TEIform            CDATA               'editionStmt'    >
<!ELEMENT publicationStmt  - O  ((publisher | distributor | authority), 
                                 (pubPlace, address?, idno+, 
                                 availability, date)+)+>
<!ATTLIST publicationStmt      %a.global
          TEIform            CDATA               'publicationStmt' >
<!ELEMENT seriesStmt       - O  (title, (idno | respStmt)*)>
<!ATTLIST seriesStmt         %a.global
          TEIform            CDATA               'seriesStmt'   >

3.1.2 Encoding Description

The encoding description will be as full as possible, to ensure that scholarly users of CIC texts can readily discover the editorial principles which have governed their creation. In practice, the encoding description in most texts will consist of a series of entity references which expand to standard descriptions of the encoding levels defined elsewhere in this document.

Formally:

< 3 Encoding Description > =

 
<!ELEMENT encodingDesc  - - (projectDesc+, samplingDecl+, 
                            editorialDecl+, tagsDecl?, refsDecl*, 
                            classDecl*, p*) >
<!ATTLIST encodingDesc      %a.global
          TEIform           CDATA                   'encodingDesc' >

[Need sections in this document, and definitions of levels if applicable, for

]

3.1.3 Profile Description

3.1.4 Revision Description

The contents of this section will invariably be a <list> containing one <item> for each change or set of changes. The items should be formatted as in the following example:

 
<revisiondesc>
<list>
<item>1996-02-08 : JPW : proofread and revise</item>
<item>1996-02-01 : CMSMcQ : resume drafting, add list of tags
in appendix</item>
<item>1996-01-18 : CMSMcQ : draft on basis of group conversation
of yesterday</item>
</list>
</revisiondesc>
That is:

3.2 Font Shifts

All font shifts will be recorded, either using the <hi> element or using analytic tags like <emph>, <foreign>, etc.

Emphasis, foreign words, etc. not marked by font shifts in the source are not normally marked in CICTEI texts. If in an exceptional case, or in a text received from external sources such elements are marked, then all <emph>, <foreign>, <distinct>, <term>, <gloss>, or <mentioned>, elements which are not marked in the conventional way (by font shift or by quotation marks) will bear the attribute specification rend=unmarked.

At level hi-1, the following special types of rendition will be distinguished and identified [list needs checking]:

This list will be revised periodically on the basis of experience, but additions to the list will not be carried back into already-encoded materials: as a result, the list of distinctions current in any document should be described in that document's header.

At level 2, further styles may be identified on a document by document basis; they will be documented in the <rendition> elements in the document's TEI header.

(See also the section on paragraphs and paragraph shapes.)

Highlighting at various levels is declared in the TEI header using the entities hi-1 and hi-2, which are declared thus:

< 4 Highlighting levels > =

 
<!ENTITY hi-1 
"Font shifts have uniformly been tagged as highlighting,
not as emphasis, foreign words, etc." >

<!ENTITY hi-2 
"Font shifts have been tagged as highlighting only when it
is not possible to identify them reliably 
as emphasis, foreign words, etc." >

A TEI header for a CICTEI document at level 1 might therefore read, in part:

 
<tagUsage gi=hi render=italics>&hi-1;</tagUsage>

3.3 Quotations

Quotations may or may not be identified.

Quotations not marked by punctuation (quotation marks, guillemets, paragraph-initial dashes, etc.) are not recorded in CICTEI texts. If an ambitious analyst has tagged them, their rend attribute will have the value unmarked.

3.4 Typography

As noted elsewhere, font shifts will be noted.

Indentation and paragraph shape may be ignored, partially recorded, or recorded in some detail:

3.5 Notes

[Notes are an open question; Mark said he would look at TEI Lite with an eye to the concerns Catherine expressed. I don't understand what those concerns were, so I don't know whether they pose a challenge or not. I'd propose the following:]

In CICTEI texts, all <note> elements bear a place attribute. Three encoding levels are distinguished:

N.B. these levels do not apply to inline block notes; they are always transcribed, at all levels, as <note place=inline>

3.6 Cross References

Cross references may or may not be identified. Three levels:

3.7 Language Shifts and Foreign-Language Material

Language shifts may or may not be identified. Four encoding levels are distinguished:

3.8 Page Breaks

Page breaks in the source edition will invariably be marked. If none are present in material acquired from elsewhere, the source edition will be located and pagination will be added to the text on the basis of the source. Page breaks in other editions will not normally be marked; when in exceptional cases a CICTEI text records the pagination of multiple editions, both editions will be included in the <sourceDesc> element, with SGML identifiers; these SGML identifiers will be used as the values of the ed attribute of the <pb> element. The edition actually used as the source of the transcription must appear first.

When the <pb> element reflects a page break in the source, its ed attribute may be omitted -- i.e. the default ed value is the id of the first item in the source description.

3.9 Canonical References

Canonical reference schemes may or may not be provided, depending on encoding level:

3.10 Text Profile

The TEI profile description will always be present to give the main language of the text, and the language of the header; it may or may not give further information:

3.11 Correction and Normalization

Correction and normalization of spelling will not be performed by CIC encoders. If we acquire and clean data from others, we will spot check their transcriptions and indicate in the header whether we found corrections and normalizations, whether they were marked as such, and how much of the text we spot checked.

3.12 Provision of SGML Identifiers

CIC e-texts may or may not systematically provide IDs for elements to make hypertext linking easier. Several levels are distinguished:

4 Quality Assurance

This section describes standard levels of quality assurance which may be performed on CIC electronic texts. They may include:

[More to be supplied.]

4.1 Validation

All CICTEI texts will be validated with SGML parsers before being made publicly accessible.

4.2 Proofreading

We will invariably spot-check all transcriptions (proofreading 0.5, 1.0, or 2.0 per cent of pages, and performing other checks to be specified) and the header will record the observed rate of typographic or other errors detected.

If we have other quality assurance checks, the header will also record the rate of errors found in spot checks. (Checks of the full text, which lead to correction of all errors, need not be recorded: the point is to provide some means of guessing the rate of errors still present in the text.)

5 Technical Details

The DTD fragments in this document are part of the extension files needed to modify the TEI main DTD in accordance with the policies described here.

There are two files we need to define. The file cicteix.ent includes modifications to the TEI's SGML entities:

< 5 CIC TEI Entities Modification File >(cicteix.ent) =

 
< Preliminaries for TEI.extensions.ent file 7 > 
< Linking attributes 17 > 
< Highlighting levels 4 > 
< Select TEI elements 9 > 

The file cicteix.dtd includes declarations for new and modified SGML elements:

< 6 CIC TEI Elements Modification File >(cicteix.dtd) =

 
< CIC TEI Header 1 > 
< Header Contents 2 > 
< Encoding Description 3 > 

The main task of the file cicteix.ent is to specify which TEI elements are to be suppressed.

< 7 Preliminaries for TEI.extensions.ent file > =

 
<!ENTITY % REDEFINE 'IGNORE' >
<!ENTITY % LEVELTWO 'INCLUDE' >
<!ENTITY % x.common 'text |' >

The entities file for the training version is the same; we'll edit it manually to supress level-two items:

< 8 CIC TEI Entities file (simplified) >(cictei1x.ent) =

 
<!ENTITY % LEVELTWO 'IGNORE' >
< CIC TEI Entities Modification File 5 > 

The actual selections are the same in each case; the difference is handled by the differing declarations of LEVELTWO as INCLUDE or IGNORE.

< 9 Select TEI elements > =

 
< Select tags from TEI driver file 10 >

<!-- ******************************************************** -->
<!-- I.  Core tag sets.                                       -->
<!-- ******************************************************** -->

<!-- Chapter 5:  TEI Header ********************************* -->
< Select tags from TEI header 11 >
<!-- Chapter 6:  Elements Available in All TEI Documents **** -->
< Select tags from TEI core tag set 12 >
<!-- Chapter 7:  Default Text Structure ********************* -->
< Select tags from default text structure 13 >

<!-- ******************************************************** -->
<!-- II.  Base tag sets.                                      -->
<!-- II.A.  DTD files                                         -->
<!-- ******************************************************** -->

<!-- Chapter 8:  Prose * (included) ************************* -->
<!-- File:  TEIPROS2.DTD (no tags) ************************** -->
<!-- Chapter 9:  Verse * (excluded) ************************* -->
<!-- Chapter 10:  Drama * (excluded) ************************ -->
<!-- Chapter 11:  Transcriptions of Speech * (excluded) ***** -->
<!-- Chapter 12:  Print Dictionaries * (excluded) *********** -->
<!-- Chapter 13:  Terminological Data * (excluded) ********** -->
<!-- * Mixed Bases * (excluded) ***************************** -->

<!-- ******************************************************** -->
<!-- III.  Additional tag sets.                               -->
<!-- ******************************************************** -->

<!-- Chapter 14:  Linking, Segmentation, and Alignment ****** -->
< Select tags from tag set for linking and alignment 14 >
<!-- Chapter 15:  Simple Analytic Mechanisms **************** -->
< Select tags from tag set for simple analysis 15 >
<!-- Chapter 16:  Feature Structures * (excluded) *********** -->
<!-- Chapter 17:  Certainty and Responsibility * (excluded) * -->
<!-- Chapter 18:  Transcription of Primary Sources * (excl) * -->
<!-- Chapter 19:  Critical Apparatus * (excluded) *********** -->
<!-- Chapter 20:  Names and Dates * (excluded) ************** -->
<!-- Chapter 21:  Graphs, Networks, and Trees * (excluded) ** -->
<!-- Chapter 22:  Tables, Formulae, and Graphics ************ -->
< Select tags from tag set for tables and figures 16 >
<!-- Chapter 23:  Language Corpora * (excluded) ************* -->
<!-- Chapter 27:  Tag Set Documentation ********************* -->

In the main TEI driver file, we select only the <tei.2> element, suppressing <teiCorpus.2>:

< 10 Select tags from TEI driver file > =

 
<!-- FILE:  TEI2.DTD -->
<!ENTITY % TEI.2        'INCLUDE' >
<!ENTITY % teiCorpus.2  'IGNORE' >

In the header,

< 11 Select tags from TEI header > =

 
<!-- File:  TEIHDR2.DTD -->
<!ENTITY % teiHeader    '%REDEFINE;' >
<!ENTITY % fileDesc     '%REDEFINE;' >
<!ENTITY % titleStmt    '%REDEFINE;' >
<!ENTITY % sponsor      'INCLUDE' -- ? -- >
<!ENTITY % funder       'INCLUDE' -- ? -- >
<!ENTITY % principal    'INCLUDE' -- ? -- >
<!ENTITY % editionStmt  '%REDEFINE;' -- ? -- >
<!ENTITY % edition      'INCLUDE' -- ? -- >
<!ENTITY % extent       'INCLUDE' -- ? -- >
<!ENTITY % publicationStmt '%REDEFINE;' >
<!ENTITY % distributor  'INCLUDE' >
<!ENTITY % authority    'INCLUDE' >
<!ENTITY % idno         'INCLUDE' >
<!ENTITY % availability 'INCLUDE' -- ? -- >
<!ENTITY % seriesStmt   '%REDEFINE;' >
<!ENTITY % notesStmt    'INCLUDE' >
<!ENTITY % sourceDesc   'INCLUDE' >
<!ENTITY % scriptStmt                  'IGNORE' >
<!ENTITY % recordingStmt               'IGNORE' >
<!ENTITY % recording                   'IGNORE' >
<!ENTITY % equipment                   'IGNORE' >
<!ENTITY % broadcast                   'IGNORE' >
<!ENTITY % encodingDesc  'INCLUDE' >
<!ENTITY % projectDesc   'INCLUDE' >
<!ENTITY % samplingDecl  'INCLUDE' >
<!ENTITY % editorialDecl 'INCLUDE' >
<!ENTITY % correction                  'IGNORE' -- ? -- >
<!ENTITY % normalization               'IGNORE' -- ? -- >
<!ENTITY % quotation                   'IGNORE' -- ? -- >
<!ENTITY % hyphenation                 'IGNORE' -- ? -- >
<!ENTITY % segmentation                'IGNORE' -- ? -- >
<!ENTITY % stdVals                     'IGNORE' -- ? -- >
<!ENTITY % interpretation              'IGNORE' -- ? -- >
<!ENTITY % tagsDecl      '%LEVELTWO;' >
<!ENTITY % tagUsage      '%LEVELTWO;' >
<!ENTITY % rendition     '%LEVELTWO;' >
<!ENTITY % refsDecl      '%LEVELTWO;' >
<!ENTITY % step                        'IGNORE' -- ? -- >
<!ENTITY % state                       'IGNORE' >
<!ENTITY % classDecl     '%LEVELTWO;' >
<!ENTITY % taxonomy      '%LEVELTWO;' >
<!ENTITY % category      '%LEVELTWO;' >
<!ENTITY % catDesc       '%LEVELTWO;' >
<!ENTITY % fsdDecl                     'IGNORE' >
<!ENTITY % metDecl                     'IGNORE' >
<!ENTITY % symbol                      'IGNORE' >
<!ENTITY % variantEncoding             'IGNORE' >
<!ENTITY % profileDesc  'INCLUDE' >
<!ENTITY % creation     '%LEVELTWO;' >
<!ENTITY % langUsage    'INCLUDE' >
<!ENTITY % language     'INCLUDE' >
<!ENTITY % textClass    '%LEVELTWO;' >
<!ENTITY % keywords     '%LEVELTWO;' >
<!ENTITY % classCode    '%LEVELTWO;' >
<!ENTITY % catRef       '%LEVELTWO;' >
<!ENTITY % revisionDesc 'INCLUDE' >
<!ENTITY % change       '%LEVELTWO;' >

In the TEI core,

< 12 Select tags from TEI core tag set > =

 
<!-- File:  TEICORE2.DTD -->
<!ENTITY % p            'INCLUDE' >
<!ENTITY % foreign      '%LEVELTWO;' >
<!ENTITY % emph         '%LEVELTWO;' >
<!ENTITY % hi           'INCLUDE' >
<!ENTITY % distinct     '%LEVELTWO;' >
<!ENTITY % q            'INCLUDE' >
<!ENTITY % quote        '%LEVELTWO;' >
<!ENTITY % cit          '%LEVELTWO;' >
<!ENTITY % soCalled     '%LEVELTWO;' >
<!ENTITY % term         '%LEVELTWO;' >
<!ENTITY % mentioned    '%LEVELTWO;' >
<!ENTITY % gloss        '%LEVELTWO;' >
<!ENTITY % name         'INCLUDE' >
<!ENTITY % rs           '%LEVELTWO;' >
<!ENTITY % num          '%LEVELTWO;' >
<!ENTITY % measure                     'IGNORE' >
<!ENTITY % date         'INCLUDE' >
<!ENTITY % dateRange                   'IGNORE' >
<!ENTITY % time         '%LEVELTWO;' >
<!ENTITY % timeRange                   'IGNORE' >
<!ENTITY % abbr         '%LEVELTWO;' >
<!ENTITY % expan                       'IGNORE' -- ? -- >
<!ENTITY % sic          '%LEVELTWO;' >
<!ENTITY % corr         '%LEVELTWO;' >
<!ENTITY % reg          '%LEVELTWO;' -- ? -- >
<!ENTITY % orig         '%LEVELTWO;' -- ? -- >
<!ENTITY % gap          'INCLUDE' >
<!ENTITY % add          '%LEVELTWO;' -- ? -- >
<!ENTITY % del          '%LEVELTWO;' -- ? -- >
<!ENTITY % unclear      '%LEVELTWO;' >
<!ENTITY % address      'INCLUDE' >
<!ENTITY % addrLine     'INCLUDE' >
<!ENTITY % street                      'IGNORE' >
<!ENTITY % postCode                    'IGNORE' >
<!ENTITY % postBox                     'IGNORE' >
<!ENTITY % ptr          'INCLUDE' >
<!ENTITY % ref          'INCLUDE' >
<!ENTITY % list         'INCLUDE' >
<!ENTITY % item         'INCLUDE' >
<!ENTITY % label        'INCLUDE' >
<!ENTITY % head         'INCLUDE' >
<!ENTITY % headLabel                   'IGNORE' >
<!ENTITY % headItem                    'IGNORE' >
<!ENTITY % note         'INCLUDE' >
<!ENTITY % index        '%LEVELTWO;' >
<!ENTITY % divGen       '%LEVELTWO;' >
<!ENTITY % milestone    '%LEVELTWO;' >
<!ENTITY % pb           'INCLUDE' >
<!ENTITY % lb           '%LEVELTWO;' >
<!ENTITY % cb                          'IGNORE' >
<!ENTITY % bibl                                    '%REDEFINE'>
<!ENTITY % biblStruct                  'IGNORE' >
<!ENTITY % biblFull     'INCLUDE' >
<!ENTITY % listBibl     'INCLUDE' >
<!ENTITY % analytic                    'IGNORE' >
<!ENTITY % monogr                      'IGNORE' >
<!ENTITY % series                      'IGNORE' >
<!ENTITY % author       'INCLUDE' >
<!ENTITY % editor       'INCLUDE' >
<!ENTITY % respStmt     'INCLUDE' >
<!ENTITY % resp         'INCLUDE' >
<!ENTITY % title        'INCLUDE' >
<!ENTITY % meeting                     'IGNORE' -- ? -- >
<!ENTITY % imprint      'INCLUDE' >
<!ENTITY % publisher    'INCLUDE' >
<!ENTITY % biblScope    'INCLUDE' >
<!ENTITY % pubPlace     'INCLUDE' >
<!ENTITY % l            'INCLUDE' >
<!ENTITY % lg           'INCLUDE' >
<!ENTITY % sp           'INCLUDE' >
<!ENTITY % speaker      'INCLUDE' >
<!ENTITY % stage        'INCLUDE' >

< 13 Select tags from default text structure > =

 
<!-- File:  TEISTR2.DTD -->
<!ENTITY % text         'INCLUDE' >
<!ENTITY % body         'INCLUDE' >
<!ENTITY % group        '%LEVELTWO;' >
<!ENTITY % div          '%LEVELTWO;' >
<!ENTITY % div0         'INCLUDE' >
<!ENTITY % div1         'INCLUDE' >
<!ENTITY % div2         'INCLUDE' >
<!ENTITY % div3         'INCLUDE' >
<!ENTITY % div4         'INCLUDE' >
<!ENTITY % div5         'INCLUDE' >
<!ENTITY % div6         'INCLUDE' >
<!ENTITY % div7         'INCLUDE' >
<!ENTITY % trailer      'INCLUDE' >
<!ENTITY % byline       'INCLUDE' >
<!ENTITY % dateline                                '%REDEFINE' >
<!ENTITY % argument     '%LEVELTWO;' >
<!ENTITY % epigraph     '%LEVELTWO;' >
<!ENTITY % opener       'INCLUDE' >
<!ENTITY % closer       'INCLUDE' >
<!ENTITY % salute       'INCLUDE' >
<!ENTITY % signed       'INCLUDE' >

<!-- File:  TEIFRON2.DTD -->
<!ENTITY % front        'INCLUDE' >
<!ENTITY % titlePage    'INCLUDE' >
<!ENTITY % docTitle     'INCLUDE' >
<!ENTITY % titlePart    'INCLUDE' >
<!ENTITY % docAuthor    'INCLUDE' >
<!ENTITY % imprimatur                  'IGNORE' -- ? -- >
<!ENTITY % docEdition   'INCLUDE' >
<!ENTITY % docImprint   'INCLUDE' >
<!ENTITY % docDate      'INCLUDE' >

<!-- File:  TEIBACK2.DTD -->
<!ENTITY % back         'INCLUDE' >

< 14 Select tags from tag set for linking and alignment > =

 
<!-- File:  TEILINK2.ENT -->
<!-- File:  TEILINK2.DTD -->
<!ENTITY % link                        'IGNORE' -- ? -- >
<!ENTITY % linkGrp                     'IGNORE' -- ? -- >
<!ENTITY % xref         '%LEVELTWO;' >
<!ENTITY % xptr         '%LEVELTWO;' >
<!ENTITY % seg          'INCLUDE' >
<!ENTITY % anchor       'INCLUDE' >
<!ENTITY % when                        'IGNORE' >
<!ENTITY % timeline                    'IGNORE' >
<!ENTITY % join                        'IGNORE' >
<!ENTITY % joinGrp                     'IGNORE' >
<!ENTITY % alt                         'IGNORE' >
<!ENTITY % altGrp                      'IGNORE' >

< 15 Select tags from tag set for simple analysis > =

 
<!-- File:  TEIANA2.ENT -->
<!-- File:  TEIANA2.DTD -->
<!ENTITY % span                        'IGNORE' -- ? -- >
<!ENTITY % spanGrp                     'IGNORE' >
<!ENTITY % interp       '%LEVELTWO;' >
<!ENTITY % interpGrp    '%LEVELTWO;' >
<!ENTITY % s            '%LEVELTWO;' >
<!ENTITY % cl                          'IGNORE' >
<!ENTITY % phr                         'IGNORE' >
<!ENTITY % w                           'IGNORE' >
<!ENTITY % m                           'IGNORE' >
<!ENTITY % c                           'IGNORE' >

< 16 Select tags from tag set for tables and figures > =

 
<!-- File:  TEIFIG2.ENT -->
<!ENTITY % formulaNotations 'CDATA'                             >
<!ENTITY % formulaContent 'CDATA'                               >

<!-- File:  TEIFIG2.DTD -->
<!ENTITY % table        'INCLUDE' >
<!ENTITY % row          'INCLUDE' >
<!ENTITY % cell         'INCLUDE' >
<!ENTITY % formula      'INCLUDE' >
<!ENTITY % figure       'INCLUDE' >
<!ENTITY % figDesc      'INCLUDE' >

6 Other Documents

We need a full, formal specification of the rules just stated, and whatever else we agree on. It will be easiest for me if this is basically a set of rules for using TEI Lite -- i.e. if it doesn't repeat anything in the TEI Lite documentation. Working title: "Rules for Use of TEI Lite in CIC E-Text Projects".

We could use, but don't absolutely require, a modified version of the TEI Lite specification which incorporates these rules. To make such a document, we'll need copyright permission from the TEI, which I'll ask for if people are eager to have a single unified document describing CICTEI.

We may need SGML and CICTEI tutorials for training staff. I believe this is not part of our current task.


Summary of Tag Usage

This appendix specifies for each tag set of the TEI, and for each tag in selected tag sets, whether it is:

Tag Sets

The following tag sets are included in TEI Lite and therefore in the CICTEI tag set:

The following tag sets, therefore, are not selected:

Tag Sets and their Tags

Document elements

Used in all cases:

Not used (omitted from TEI Lite). [Restore for CIC?]

Header

Used in all CICTEI texts:

N.B. many of these will be standard elements which describe the markup practices in various levels of encoding. They will not all require much manual intervention.

Used when applicable [needs further specification for each element type]:

Not used (omitted from TEI Lite):

Core Tag Set

Used:[1]

Not used (omitted from TEI Lite):

Text Structure

Used:

Front and Back Matter

Not used (omitted from TEI Lite):

Linking and Alignment

Not used (omitted from TEI Lite):

Simple Analytic Mechanisms

Not used (omitted from TEI Lite):

Tables, Formulae, and Graphics

Tag Set Documentation

TEI Lite integrates several elements from this auxiliary tag set, so they can be used in prose:

Tags

Additional Elements

The CICTEI tag set extends the TEI element-class system as follows:

All of these changes are the same as in TEI Lite.

The global attribute class linking is modified; it uses the following declaration instead of the standard one:

< 17 Linking attributes > =

 
<!ENTITY % a.linking '
          corresp            IDREFS              #IMPLIED
          next               IDREF               #IMPLIED
          prev               IDREF               #IMPLIED'      >
 

Open Issues

The following issues need to be decided before this document is final:

Notes

[1] Recognition criteria and so on need to be specified for each of these. Probably the best approach is to group them into classes of elements always used, never used, used under certain (specified) conditions, ...
[return to text]