This document is a draft specification of the encoding scheme to be used in electronic texts created by cooperative projects of the Committee on Institutional Cooperation (CIC). It has not yet been approved by the CIC or anyone else. It has been drafted by the authors named on the title page for consideration by the appropriate bodies, and should not be taken as a final product until those bodies have revised and approved it, and this note is removed.
On some topics, specific proposals are made in this document. Like the document as a whole, these proposals have not yet been approved formally and are subject to discussion and change. They are not final. Other topics are identified explicitly as open issues, on which discussion and decisions are needed; no proposals are made for dealing with open issues. A list of open issues appears in the appendix for convenient consultation.
The current partial draft of this document was prepared by C. M. Sperberg-McQueen on the basis of discussions with the other named authors. In completing and finishing the document, the following steps are expected:
The following things remain to be done before this document is approved by the CIC working group on e-texts:
[Discuss background: CIC American Corpus project, CIC, TEI, audience to be served by the e-text collection, expected work processes...]
The CICTEI encoding scheme is based on the TEI Lite subset of TEI, documented in Lou Burnard and C. M. Sperberg-McQueen, "TEI Lite: An Introduction to Text Encoding for Interchange" (document TEI U5), which can be found at http://www-tei.uic.edu/orgs/tei/intros/teiu5.tei or http://www-tei.uic.edu/orgs/tei/intros/teiu5.html. Familiarity with the basics of SGML encoding and the TEI encoding scheme are assumed; readers who wish to learn the basics are referred to the TEI home page, especially to the list of tutorials and introductions.
The following TEI tag sets are selected:
This section describes CIC encoding practice in particular areas. In many areas, several different levels of practice are defined; unless otherwise stated, all electronic texts created by CIC projects will adhere at least to the lowest defined level. In some cases, different minimum standards may apply to electronic texts acquired by the CIC from other sources and edited into SGML form. Such differences are always noted; if nothing is said, no distinction is made between work created by the CIC and work acquired from other sources and mounted by the CIC on CIC-wide servers. Documents acquired from other sources in SGML form may or may not be modified to meet minimal CIC standards; if they are not so modified, they may be described as being at `encoding level 0'.
No global encoding levels are defined; a text may be at level 1 with respect to notes, and level 4 with respect to quotations. A full description of the encoding of a given text thus requires that its level be specified for each area defined here.
If documents need to be characterized in terms of single numbers, then the following overall characterizations may be used:
In general, higher levels either are more exhaustive in identifying occurrences of specific textual features (e.g. quotation), or more complete in describing them, or both. The definition of the different levels thus always specifies both the recognition criteria for the element in question and the analytic detail to be supplied at each level.
The CICTEI encoding scheme requires that the encoding practice of each electronic text be documented in its TEI header. This document describes CIC encoding practice in prose, and also gives examples of elements in the TEI header which are to be used to document CIC practice. Since the encoding practice will be consistent for most texts produced by the CIC, the <encodingDesc> elements of most CIC texts can be substantially the same; the same boilerplate text can be used for all texts at a particular level. This document defines SGML entities for use in headers of CIC documents, which will make it easier to create the TEI headers. See examples below.
This document describes two versions of the CICTEI DTD: a `level-1' version which includes only the elements required by level-1 tagging, and a full version which includes all the elements selected. The level-1 DTD is intended only for training purposes.
In CIC electronic documents, the TEI header will have somewhat less flexibility than in the full TEI scheme. The primary goal is to enforce a higher minimum standard of documentation, and to make it easier for unintelligent software to identify an author, title, date, and source edition for any text. Headers which conform to the CICTEI specification will always also automatically conform to the TEI Lite specification and the base TEI specification.
It would also be possible to rewrite the header and provide new tags with unique names for what we regard as the critical bits of information. This is not done here because it seems unnecessary to depart from the basic TEI DTD in this way.
At the top level, all four parts of the header will be required, rather than optional. (Since much of the content will be constant across CIC documents, this will not impose as much burden on encoders as might be supposed.) Formally:
<!ELEMENT teiHeader - - (fileDesc, encodingDesc+, profileDesc+, revisionDesc) > <!ATTLIST teiHeader %a.global; type CDATA text creator CDATA #IMPLIED status (new | update) new date.created CDATA #IMPLIED date.updated CDATA #IMPLIED TEIform CDATA 'teiHeader' >The <encodingDesc> and <profileDesc> elements will repeat only in the case of corpora and collections, and probably not often then; in normal cases, only one of each will appear.
The title statement, extent statement, publication statement, and source description are all required. For normal texts, a single source edition must be identified. CICTEI documents will not reflect collation of multiple sources; at least, not without revision of this document. An edition statement and series statement will be supplied whenever appropriate (reissues or revisions of CIC texts, CIC texts created as part of a series). A notes statement will be provided only in exceptional cases.
Within the title statement, CIC texts will invariably give at least one title and at least one author (unknown and anonymous authors will be given as "Unknown" and "Anonymous", respectively). Where multiple titles are given, the first <title> element will be suitable for use by software as a short title or main title for the work. Other statements of responsibility, for editors, for sponsors, funders, and principal investigators of the various projects which create CIC texts, and for other miscellaneous forms of responsibility, will be given in a fixed sequence. [Is this necessary?]
Edition statements will use the <edition> and <respStmt> elements, not the unstructured series of <p> elements; series statements will similarly use the structured, not the unstructured, encoding defined in TEI Lite.
Publication statements will give the publishing information in a rigid sequence; the place of publication, terms of availability, and the date of publication will always be given. CIC texts will normally be given identification numbers by the CIC, and the DTD requires at least one (if none applies, it should be left blank).
The source description will normally take the form of a <biblFull> element; in the case of texts created in electronic form and therefore without a non-electronic source, the source description will take the following form:
<sourceDesc> <p>Created in electronic form.</p> </sourceDesc>A standard CIC entity may be declared, to allow this to be abbreviated:
<sourceDesc> &newtext; </sourceDesc>
Formally:
<!ELEMENT fileDesc - - (titleStmt, editionStmt?, extent, publicationStmt, seriesStmt?, notesStmt?, sourceDesc) > <!ATTLIST fileDesc %a.global; TEIform CDATA 'fileDesc' > <!ELEMENT titleStmt - O (title+, author+, (editor | sponsor | funder | principal | respStmt)*) > <!ATTLIST titleStmt %a.global TEIform CDATA 'titleStmt' > <!ELEMENT editionStmt - O (edition, respStmt*) > <!ATTLIST editionStmt %a.global TEIform CDATA 'editionStmt' > <!ELEMENT publicationStmt - O ((publisher | distributor | authority), (pubPlace, address?, idno+, availability, date)+)+> <!ATTLIST publicationStmt %a.global TEIform CDATA 'publicationStmt' > <!ELEMENT seriesStmt - O (title, (idno | respStmt)*)> <!ATTLIST seriesStmt %a.global TEIform CDATA 'seriesStmt' >
The encoding description will be as full as possible, to ensure that scholarly users of CIC texts can readily discover the editorial principles which have governed their creation. In practice, the encoding description in most texts will consist of a series of entity references which expand to standard descriptions of the encoding levels defined elsewhere in this document.
Formally:
<!ELEMENT encodingDesc - - (projectDesc+, samplingDecl+, editorialDecl+, tagsDecl?, refsDecl*, classDecl*, p*) > <!ATTLIST encodingDesc %a.global TEIform CDATA 'encodingDesc' >
[Need sections in this document, and definitions of levels if applicable, for
The contents of this section will invariably be a <list> containing one <item> for each change or set of changes. The items should be formatted as in the following example:
<revisiondesc> <list> <item>1996-02-08 : JPW : proofread and revise</item> <item>1996-02-01 : CMSMcQ : resume drafting, add list of tags in appendix</item> <item>1996-01-18 : CMSMcQ : draft on basis of group conversation of yesterday</item> </list> </revisiondesc>That is:
All font shifts will be recorded, either using the <hi> element or using analytic tags like <emph>, <foreign>, etc.
Emphasis, foreign words, etc. not marked by font shifts
in the source are not normally marked in CICTEI texts. If in an exceptional
case, or in a text received from external sources such elements are
marked, then all <emph>, <foreign>, <distinct>,
<term>, <gloss>, or <mentioned>, elements which
are not marked in the conventional way (by font shift or by quotation
marks) will bear the attribute specification
rend=unmarked
.
At level hi-1, the following special types of rendition will be distinguished and identified [list needs checking]:
At level 2, further styles may be identified on a document by document basis; they will be documented in the <rendition> elements in the document's TEI header.
(See also the section on paragraphs and paragraph shapes.)
Highlighting at various levels is declared in the TEI header using the entities hi-1 and hi-2, which are declared thus:
<!ENTITY hi-1 "Font shifts have uniformly been tagged as highlighting, not as emphasis, foreign words, etc." > <!ENTITY hi-2 "Font shifts have been tagged as highlighting only when it is not possible to identify them reliably as emphasis, foreign words, etc." >
A TEI header for a CICTEI document at level 1 might therefore read, in part:
<tagUsage gi=hi render=italics>&hi-1;</tagUsage>
Quotations may or may not be identified.
Quotations not marked by punctuation (quotation marks, guillemets,
paragraph-initial dashes, etc.) are not recorded in CICTEI texts.
If an ambitious analyst has tagged them, their rend
attribute will have the value unmarked
.
As noted elsewhere, font shifts will be noted.
Indentation and paragraph shape may be ignored, partially recorded, or recorded in some detail:
indented
(first line indented, left and right
justified)block
(first line not indented, left and right
justified)ragright
(first line indented, left smooth,
ragged right margin) ragrightblock
(first line not indented, left smooth,
ragged right margin)ragleft
(first line indented, smooth right,
ragged left margin) ragleftblock
(first line not indented, smooth right,
ragged left margin)centered
(each line of the paragraph is
centered)other
for anything too complex to be described
with these keywordsdisplay
is added as a prefix if the
element has indented left and right margins, as for a display
quote.other
; these are described
in the TEI header.[Notes are an open question; Mark said he would look at TEI Lite with an eye to the concerns Catherine expressed. I don't understand what those concerns were, so I don't know whether they pose a challenge or not. I'd propose the following:]
In CICTEI texts, all <note> elements bear a place attribute. Three encoding levels are distinguished:
N.B. these levels do not apply to inline block notes; they are always transcribed, at all levels, as <note place=inline>
Cross references may or may not be identified. Three levels:
Language shifts may or may not be identified. Four encoding levels are distinguished:
lang=unknown
is used if necessaryPage breaks in the source edition will invariably be marked. If none are present in material acquired from elsewhere, the source edition will be located and pagination will be added to the text on the basis of the source. Page breaks in other editions will not normally be marked; when in exceptional cases a CICTEI text records the pagination of multiple editions, both editions will be included in the <sourceDesc> element, with SGML identifiers; these SGML identifiers will be used as the values of the ed attribute of the <pb> element. The edition actually used as the source of the transcription must appear first.
When the <pb> element reflects a page break in the source, its ed attribute may be omitted -- i.e. the default ed value is the id of the first item in the source description.
Canonical reference schemes may or may not be provided, depending on encoding level:
The TEI profile description will always be present to give the main language of the text, and the language of the header; it may or may not give further information:
Correction and normalization of spelling will not be performed by CIC encoders. If we acquire and clean data from others, we will spot check their transcriptions and indicate in the header whether we found corrections and normalizations, whether they were marked as such, and how much of the text we spot checked.
CIC e-texts may or may not systematically provide IDs for elements to make hypertext linking easier. Several levels are distinguished:
This section describes standard levels of quality assurance which may be performed on CIC electronic texts. They may include:
[More to be supplied.]
All CICTEI texts will be validated with SGML parsers before being made publicly accessible.
We will invariably spot-check all transcriptions (proofreading 0.5, 1.0, or 2.0 per cent of pages, and performing other checks to be specified) and the header will record the observed rate of typographic or other errors detected.
If we have other quality assurance checks, the header will also record the rate of errors found in spot checks. (Checks of the full text, which lead to correction of all errors, need not be recorded: the point is to provide some means of guessing the rate of errors still present in the text.)
The DTD fragments in this document are part of the extension files needed to modify the TEI main DTD in accordance with the policies described here.
There are two files we need to define. The file cicteix.ent includes modifications to the TEI's SGML entities:
< 5 CIC TEI Entities Modification File >(cicteix.ent) =
< Preliminaries for TEI.extensions.ent file 7 > < Linking attributes 17 > < Highlighting levels 4 > < Select TEI elements 9 >
The file cicteix.dtd includes declarations for new and modified SGML elements:
< 6 CIC TEI Elements Modification File >(cicteix.dtd) =
< CIC TEI Header 1 > < Header Contents 2 > < Encoding Description 3 >
The main task of the file cicteix.ent is to specify which TEI elements are to be suppressed.
< 7 Preliminaries for TEI.extensions.ent file > =
<!ENTITY % REDEFINE 'IGNORE' > <!ENTITY % LEVELTWO 'INCLUDE' > <!ENTITY % x.common 'text |' >
The entities file for the training version is the same; we'll edit it manually to supress level-two items:
< 8 CIC TEI Entities file (simplified) >(cictei1x.ent) =
<!ENTITY % LEVELTWO 'IGNORE' > < CIC TEI Entities Modification File 5 >
The actual selections are the same in each case;
the difference is handled by the differing declarations
of LEVELTWO as INCLUDE
or
IGNORE
.
< Select tags from TEI driver file 10 > <!-- ******************************************************** --> <!-- I. Core tag sets. --> <!-- ******************************************************** --> <!-- Chapter 5: TEI Header ********************************* --> < Select tags from TEI header 11 > <!-- Chapter 6: Elements Available in All TEI Documents **** --> < Select tags from TEI core tag set 12 > <!-- Chapter 7: Default Text Structure ********************* --> < Select tags from default text structure 13 > <!-- ******************************************************** --> <!-- II. Base tag sets. --> <!-- II.A. DTD files --> <!-- ******************************************************** --> <!-- Chapter 8: Prose * (included) ************************* --> <!-- File: TEIPROS2.DTD (no tags) ************************** --> <!-- Chapter 9: Verse * (excluded) ************************* --> <!-- Chapter 10: Drama * (excluded) ************************ --> <!-- Chapter 11: Transcriptions of Speech * (excluded) ***** --> <!-- Chapter 12: Print Dictionaries * (excluded) *********** --> <!-- Chapter 13: Terminological Data * (excluded) ********** --> <!-- * Mixed Bases * (excluded) ***************************** --> <!-- ******************************************************** --> <!-- III. Additional tag sets. --> <!-- ******************************************************** --> <!-- Chapter 14: Linking, Segmentation, and Alignment ****** --> < Select tags from tag set for linking and alignment 14 > <!-- Chapter 15: Simple Analytic Mechanisms **************** --> < Select tags from tag set for simple analysis 15 > <!-- Chapter 16: Feature Structures * (excluded) *********** --> <!-- Chapter 17: Certainty and Responsibility * (excluded) * --> <!-- Chapter 18: Transcription of Primary Sources * (excl) * --> <!-- Chapter 19: Critical Apparatus * (excluded) *********** --> <!-- Chapter 20: Names and Dates * (excluded) ************** --> <!-- Chapter 21: Graphs, Networks, and Trees * (excluded) ** --> <!-- Chapter 22: Tables, Formulae, and Graphics ************ --> < Select tags from tag set for tables and figures 16 > <!-- Chapter 23: Language Corpora * (excluded) ************* --> <!-- Chapter 27: Tag Set Documentation ********************* -->
In the main TEI driver file, we select only the <tei.2> element, suppressing <teiCorpus.2>:
< 10 Select tags from TEI driver file > =
<!-- FILE: TEI2.DTD --> <!ENTITY % TEI.2 'INCLUDE' > <!ENTITY % teiCorpus.2 'IGNORE' >
In the header,
< 11 Select tags from TEI header > =
<!-- File: TEIHDR2.DTD --> <!ENTITY % teiHeader '%REDEFINE;' > <!ENTITY % fileDesc '%REDEFINE;' > <!ENTITY % titleStmt '%REDEFINE;' > <!ENTITY % sponsor 'INCLUDE' -- ? -- > <!ENTITY % funder 'INCLUDE' -- ? -- > <!ENTITY % principal 'INCLUDE' -- ? -- > <!ENTITY % editionStmt '%REDEFINE;' -- ? -- > <!ENTITY % edition 'INCLUDE' -- ? -- > <!ENTITY % extent 'INCLUDE' -- ? -- > <!ENTITY % publicationStmt '%REDEFINE;' > <!ENTITY % distributor 'INCLUDE' > <!ENTITY % authority 'INCLUDE' > <!ENTITY % idno 'INCLUDE' > <!ENTITY % availability 'INCLUDE' -- ? -- > <!ENTITY % seriesStmt '%REDEFINE;' > <!ENTITY % notesStmt 'INCLUDE' > <!ENTITY % sourceDesc 'INCLUDE' > <!ENTITY % scriptStmt 'IGNORE' > <!ENTITY % recordingStmt 'IGNORE' > <!ENTITY % recording 'IGNORE' > <!ENTITY % equipment 'IGNORE' > <!ENTITY % broadcast 'IGNORE' > <!ENTITY % encodingDesc 'INCLUDE' > <!ENTITY % projectDesc 'INCLUDE' > <!ENTITY % samplingDecl 'INCLUDE' > <!ENTITY % editorialDecl 'INCLUDE' > <!ENTITY % correction 'IGNORE' -- ? -- > <!ENTITY % normalization 'IGNORE' -- ? -- > <!ENTITY % quotation 'IGNORE' -- ? -- > <!ENTITY % hyphenation 'IGNORE' -- ? -- > <!ENTITY % segmentation 'IGNORE' -- ? -- > <!ENTITY % stdVals 'IGNORE' -- ? -- > <!ENTITY % interpretation 'IGNORE' -- ? -- > <!ENTITY % tagsDecl '%LEVELTWO;' > <!ENTITY % tagUsage '%LEVELTWO;' > <!ENTITY % rendition '%LEVELTWO;' > <!ENTITY % refsDecl '%LEVELTWO;' > <!ENTITY % step 'IGNORE' -- ? -- > <!ENTITY % state 'IGNORE' > <!ENTITY % classDecl '%LEVELTWO;' > <!ENTITY % taxonomy '%LEVELTWO;' > <!ENTITY % category '%LEVELTWO;' > <!ENTITY % catDesc '%LEVELTWO;' > <!ENTITY % fsdDecl 'IGNORE' > <!ENTITY % metDecl 'IGNORE' > <!ENTITY % symbol 'IGNORE' > <!ENTITY % variantEncoding 'IGNORE' > <!ENTITY % profileDesc 'INCLUDE' > <!ENTITY % creation '%LEVELTWO;' > <!ENTITY % langUsage 'INCLUDE' > <!ENTITY % language 'INCLUDE' > <!ENTITY % textClass '%LEVELTWO;' > <!ENTITY % keywords '%LEVELTWO;' > <!ENTITY % classCode '%LEVELTWO;' > <!ENTITY % catRef '%LEVELTWO;' > <!ENTITY % revisionDesc 'INCLUDE' > <!ENTITY % change '%LEVELTWO;' >
In the TEI core,
< 12 Select tags from TEI core tag set > =
<!-- File: TEICORE2.DTD --> <!ENTITY % p 'INCLUDE' > <!ENTITY % foreign '%LEVELTWO;' > <!ENTITY % emph '%LEVELTWO;' > <!ENTITY % hi 'INCLUDE' > <!ENTITY % distinct '%LEVELTWO;' > <!ENTITY % q 'INCLUDE' > <!ENTITY % quote '%LEVELTWO;' > <!ENTITY % cit '%LEVELTWO;' > <!ENTITY % soCalled '%LEVELTWO;' > <!ENTITY % term '%LEVELTWO;' > <!ENTITY % mentioned '%LEVELTWO;' > <!ENTITY % gloss '%LEVELTWO;' > <!ENTITY % name 'INCLUDE' > <!ENTITY % rs '%LEVELTWO;' > <!ENTITY % num '%LEVELTWO;' > <!ENTITY % measure 'IGNORE' > <!ENTITY % date 'INCLUDE' > <!ENTITY % dateRange 'IGNORE' > <!ENTITY % time '%LEVELTWO;' > <!ENTITY % timeRange 'IGNORE' > <!ENTITY % abbr '%LEVELTWO;' > <!ENTITY % expan 'IGNORE' -- ? -- > <!ENTITY % sic '%LEVELTWO;' > <!ENTITY % corr '%LEVELTWO;' > <!ENTITY % reg '%LEVELTWO;' -- ? -- > <!ENTITY % orig '%LEVELTWO;' -- ? -- > <!ENTITY % gap 'INCLUDE' > <!ENTITY % add '%LEVELTWO;' -- ? -- > <!ENTITY % del '%LEVELTWO;' -- ? -- > <!ENTITY % unclear '%LEVELTWO;' > <!ENTITY % address 'INCLUDE' > <!ENTITY % addrLine 'INCLUDE' > <!ENTITY % street 'IGNORE' > <!ENTITY % postCode 'IGNORE' > <!ENTITY % postBox 'IGNORE' > <!ENTITY % ptr 'INCLUDE' > <!ENTITY % ref 'INCLUDE' > <!ENTITY % list 'INCLUDE' > <!ENTITY % item 'INCLUDE' > <!ENTITY % label 'INCLUDE' > <!ENTITY % head 'INCLUDE' > <!ENTITY % headLabel 'IGNORE' > <!ENTITY % headItem 'IGNORE' > <!ENTITY % note 'INCLUDE' > <!ENTITY % index '%LEVELTWO;' > <!ENTITY % divGen '%LEVELTWO;' > <!ENTITY % milestone '%LEVELTWO;' > <!ENTITY % pb 'INCLUDE' > <!ENTITY % lb '%LEVELTWO;' > <!ENTITY % cb 'IGNORE' > <!ENTITY % bibl '%REDEFINE'> <!ENTITY % biblStruct 'IGNORE' > <!ENTITY % biblFull 'INCLUDE' > <!ENTITY % listBibl 'INCLUDE' > <!ENTITY % analytic 'IGNORE' > <!ENTITY % monogr 'IGNORE' > <!ENTITY % series 'IGNORE' > <!ENTITY % author 'INCLUDE' > <!ENTITY % editor 'INCLUDE' > <!ENTITY % respStmt 'INCLUDE' > <!ENTITY % resp 'INCLUDE' > <!ENTITY % title 'INCLUDE' > <!ENTITY % meeting 'IGNORE' -- ? -- > <!ENTITY % imprint 'INCLUDE' > <!ENTITY % publisher 'INCLUDE' > <!ENTITY % biblScope 'INCLUDE' > <!ENTITY % pubPlace 'INCLUDE' > <!ENTITY % l 'INCLUDE' > <!ENTITY % lg 'INCLUDE' > <!ENTITY % sp 'INCLUDE' > <!ENTITY % speaker 'INCLUDE' > <!ENTITY % stage 'INCLUDE' >
< 13 Select tags from default text structure > =
<!-- File: TEISTR2.DTD --> <!ENTITY % text 'INCLUDE' > <!ENTITY % body 'INCLUDE' > <!ENTITY % group '%LEVELTWO;' > <!ENTITY % div '%LEVELTWO;' > <!ENTITY % div0 'INCLUDE' > <!ENTITY % div1 'INCLUDE' > <!ENTITY % div2 'INCLUDE' > <!ENTITY % div3 'INCLUDE' > <!ENTITY % div4 'INCLUDE' > <!ENTITY % div5 'INCLUDE' > <!ENTITY % div6 'INCLUDE' > <!ENTITY % div7 'INCLUDE' > <!ENTITY % trailer 'INCLUDE' > <!ENTITY % byline 'INCLUDE' > <!ENTITY % dateline '%REDEFINE' > <!ENTITY % argument '%LEVELTWO;' > <!ENTITY % epigraph '%LEVELTWO;' > <!ENTITY % opener 'INCLUDE' > <!ENTITY % closer 'INCLUDE' > <!ENTITY % salute 'INCLUDE' > <!ENTITY % signed 'INCLUDE' > <!-- File: TEIFRON2.DTD --> <!ENTITY % front 'INCLUDE' > <!ENTITY % titlePage 'INCLUDE' > <!ENTITY % docTitle 'INCLUDE' > <!ENTITY % titlePart 'INCLUDE' > <!ENTITY % docAuthor 'INCLUDE' > <!ENTITY % imprimatur 'IGNORE' -- ? -- > <!ENTITY % docEdition 'INCLUDE' > <!ENTITY % docImprint 'INCLUDE' > <!ENTITY % docDate 'INCLUDE' > <!-- File: TEIBACK2.DTD --> <!ENTITY % back 'INCLUDE' >
< 14 Select tags from tag set for linking and alignment > =
<!-- File: TEILINK2.ENT --> <!-- File: TEILINK2.DTD --> <!ENTITY % link 'IGNORE' -- ? -- > <!ENTITY % linkGrp 'IGNORE' -- ? -- > <!ENTITY % xref '%LEVELTWO;' > <!ENTITY % xptr '%LEVELTWO;' > <!ENTITY % seg 'INCLUDE' > <!ENTITY % anchor 'INCLUDE' > <!ENTITY % when 'IGNORE' > <!ENTITY % timeline 'IGNORE' > <!ENTITY % join 'IGNORE' > <!ENTITY % joinGrp 'IGNORE' > <!ENTITY % alt 'IGNORE' > <!ENTITY % altGrp 'IGNORE' >
< 15 Select tags from tag set for simple analysis > =
<!-- File: TEIANA2.ENT --> <!-- File: TEIANA2.DTD --> <!ENTITY % span 'IGNORE' -- ? -- > <!ENTITY % spanGrp 'IGNORE' > <!ENTITY % interp '%LEVELTWO;' > <!ENTITY % interpGrp '%LEVELTWO;' > <!ENTITY % s '%LEVELTWO;' > <!ENTITY % cl 'IGNORE' > <!ENTITY % phr 'IGNORE' > <!ENTITY % w 'IGNORE' > <!ENTITY % m 'IGNORE' > <!ENTITY % c 'IGNORE' >
< 16 Select tags from tag set for tables and figures > =
<!-- File: TEIFIG2.ENT --> <!ENTITY % formulaNotations 'CDATA' > <!ENTITY % formulaContent 'CDATA' > <!-- File: TEIFIG2.DTD --> <!ENTITY % table 'INCLUDE' > <!ENTITY % row 'INCLUDE' > <!ENTITY % cell 'INCLUDE' > <!ENTITY % formula 'INCLUDE' > <!ENTITY % figure 'INCLUDE' > <!ENTITY % figDesc 'INCLUDE' >
We need a full, formal specification of the rules just stated, and whatever else we agree on. It will be easiest for me if this is basically a set of rules for using TEI Lite -- i.e. if it doesn't repeat anything in the TEI Lite documentation. Working title: "Rules for Use of TEI Lite in CIC E-Text Projects".
We could use, but don't absolutely require, a modified version of the TEI Lite specification which incorporates these rules. To make such a document, we'll need copyright permission from the TEI, which I'll ask for if people are eager to have a single unified document describing CICTEI.
We may need SGML and CICTEI tutorials for training staff. I believe this is not part of our current task.
This appendix specifies for each tag set of the TEI, and for each tag in selected tag sets, whether it is:
The following tag sets are included in TEI Lite and therefore in the CICTEI tag set:
The following tag sets, therefore, are not selected:
Used in all cases:
Not used (omitted from TEI Lite). [Restore for CIC?]
Used in all CICTEI texts:
Used when applicable [needs further specification for each element type]:
Not used (omitted from TEI Lite):
Used:[1]
Not used (omitted from TEI Lite):
Used:
Not used (omitted from TEI Lite):
Not used (omitted from TEI Lite):
Not used (omitted from TEI Lite):
TEI Lite integrates several elements from this auxiliary tag set, so they can be used in prose:
The CICTEI tag set extends the TEI element-class system as follows:
All of these changes are the same as in TEI Lite.
The global attribute class linking is modified; it uses the following declaration instead of the standard one:
<!ENTITY % a.linking ' corresp IDREFS #IMPLIED next IDREF #IMPLIED prev IDREF #IMPLIED' >
The following issues need to be decided before this document is final:
[1]
Recognition criteria and so on need to
be specified for each of these. Probably the best approach is to
group them into classes of elements always used, never used, used
under certain (specified) conditions, ...
[return to text]