TEI META Task Force: Notes from Meeting at AFNor, 16 Oct 03 [MEW05]Meeting at AFNor, 16 Oct 03
Lou, Sebastian, and Laurent met at AFNor in Paris on 16 October. We discussed some specific details about the proposed ODD-NG, the likely outputs from the TEI-ISO joint activity on Feature Structures, what should be done about the terminology chapter, and also Laurent's plans as leader of current ISO/TC 37/SC 4 activities, all which turned out to have some interesting synergies.
ISO has decided that there should be a single repository for linguistic-related applications, which would collate and register data categories of various kinds, by analogy with existing standards for registration of country and language names. Current work in ISO/TC 37/SC 4 relates to the model to be used for representing information within this repository and is thus terminological, in the same way that ODD and FS are, being concerned with the use of identifiers, definitions, notes, etc. together with names and mappings into different languages. The repository would be concept-based rather than name-based, however; representation and realization were secondary issues.
Laurent suggested that where a data category identified by some tagdoc (e.g. an element, a class, a data value), also existed in the ISO registry, there should be an explicit link. We agreed that this was highly desirable, and that the old <equiv> element should be repurposed for this. The new version would be used to specify equivalences between the categories defined in the ODD and those defined in other standards, e.g. the data registry for linguistic data, ISO 11179 for metadata, etc.
The new data category registry would include metadata vocabulary from OLAC and IMDI as well as the full morphosyntactic categorization inherited from EAGLES and Multext. Laurent's view was that it was unrealistic to attempt to define a full linguistic ontology, parts of one might be built up incrementally in this way. He agreed to make available a copy of the current draft data category specification.
We discussed the TEI Terminology recommendations, and agreed that although the current TEI recommendations were now outdated, terminology as such was an important part of the TEI intellectual landscape which should not be abandoned. We thought that we should find a way of embeddeding TBX conformant terminological descriptions into a TEI <termEntry> , since TBC is conformant to the model developed in ISO 16642, which is derived from TMF, the Termonological Markup Framework which developed from the original TEI work. The TEI model would specify where such descriptions fitted within a TEI document, but not what their contents were; as such, this was comparable with ways of embedding objects from other vocabularies, e.g. SVG. We agreed that someone should check the ISO recommendations against the current TEI model, and produce a revised draft of the chapter but not who or by when.
Reviewing the likely timetable for the joint ISO/TEI work on feature structures, we thought it would be worth trying to use the FS specification as a testbed for ODD-NG, with a view to presenting some preliminary version at the FS meeting in Nancy next month, and a more stable one in time for the FS meeting in Jeju in Feb 2004. Laurent felt that there was some potential for using TEI to draft technical documentation in the ISO context.
- rather than use CDATA marked sections for examples, we should use the existing <exemplum> tag with a content model of ANY: this implies that all gis used must be unique within the TEI name space, and that ODD-NG can only be validated against a full TEI dtd
- simplify the content model for <valList> to contain only <valItem> elements whose n attribute carries the value being described, and whose content is the gloss, rather than the current series of <val> and <desc> pairs
- remove the <part> element, if possible: membership of an element in a particular module should not be hard-wired
- remove the <dataDesc> element, if possible: it is redundant
- consider removing the first if there is more than one <ptr> at the end of a doc element: the canonical reference for this thing should be generated from the point at which the object concerned is declared, not in its documentation
- there should be a single wrapper element for the RNG specifications defining the content model ( <content> ? <model> ?)
- replace the specific naming elements <gi> , <attName> , <entName> , <class> with the generic <ident> element, since the thing being named is implicit in the parent element.
- rename the <name> element within <entDoc> , <classDoc> , and <tagDoc> as <gloss> and remove <rs> from content model: the name of the thing is given by its <ident> ; the content of this element simply expands that name where its meaning is not obvious.
- rename <dtdRef> and <dtdFrag> to <chunkRef> and <chunk> respectively; similarly rename <entDoc> , <entDecl> as <patternDoc> and <patternDecl> : the old names are too SGML-DTD-specific
- disallow <classes> element with null name attribute
- disallow empty <attList> element (ie must have <attDef> children)
- ideally, content models should refer only to element classes, not to individual elements
- references to objects documented in ODD (tags, classes, etc.) should all be made by pointing to the <ident> element rather than by using the ID/IDREF mechanism
- add an (optional repeatable) <equiv> element wherever an <ident> can occur: this can reference a concept in the ISO data repository or elsewhere; or it might contain an Xpath to locate some equivalent markup construct elsewhere.