Minutes
Of the Metalanguage and Syntax Committee Meeting
Kingston, 4-5 March 1991
Lou Burnard
Document Number: TEI ML M45
31 March 1991
Draft April 26, 1991 (16:48:22)
Present: David Barnard (DTB), chair; David Durand (DD); Frank Tompa
(FWT); Lou Burnard (LB); C. M. Sperberg-McQueen (MSM); Doug Hamilton
(DH).
Apologies for absence: Nancy Ide and Lynne Price.
1
AGENDA
The agenda provided by DTB was accepted, save that item 3 (the Liter-
ary Work Group Critique) was moved to the end.
2
MINUTES OF OXFORD MEETING (ML M33)
Approved as a true record.
3
UPDATE ON OVERALL STATUS OF THE PROJECT
MSM briefly recapped the current status of the various work groups
which had been set up since the last meeting, as summarised by a recent
posting on TEI-L. Most, though not all, work groups were still quies-
cent. DTB asked what further input to the TEI Guidelines was anticipated
from the committee. MSM replied that it was clear that a substantial
reorganisation of the current P1 would be needed, probably into several
publications. DTB suggested that papers on transduction methods, already
a work item for the committee, might be helpful. MSM agreed and reminded
the committee that the current chapter 8 also needed to be recast, per-
haps as a "cookbook" on DTD manipulation.
On software, LB reported that Yard Software had offered a special
deal for MarkIt, their PC-based SGML parser. Price for an individual
licence was £100, discounted 50% if bought in bulk.
Note: The TEI Steering Committee has since agreed to release funds
to pursue the bulk purchase option Martin Bryan of Yard was interested
in developing a TEI-specific version of WriteIt, Yard's low-end data
entry system and LB had supplied him with copies of the P1 dtds to pur-
sue this possibility. FWT asked how such customization would cope with
the possibility of extensions to the DTD: LB replied that the extension
capability was not yet implemented in the DTDs.
DH reported that some problems had been experienced in installing the
beta test version of Electronic Book Technology's DynaText product,
though these were believed to have been fixed in the first official
release. This release was available to TEI participants at a discounted
price of $2500.
It was also noted that Software Exoterica's parser was available
through GCA for under $100.
4
ACTION ITEMS FROM PREVIOUS MEETING
Parenthesized numbers in this list refer to the numbered points in
the minutes of the previous meeting.
Parser Pitfalls Document (3): LAP had sent the required information to
Wendy Plotkin.
ACTION:
MSM to follow up availability of LAP's notes outside Hewlett
Packard
Due: 13 March
Document Review (5): All documents available in electronic form had now
been placed on the UICVM server, including W12. Actions on ODA and on
liaison with ANSI had been carried out. Discussion of the action on
DTB to propose a package of documents to go with W32 was deferred.
SGML declaration revision (8): Deferred, pending further discussion of
the implications of disabling both SHORTTAG and OMITTAG.
DTD Manipulator (10): The editors had not yet formulated their
requirements in this area: DTB would report on some experiments with
indirect DTDs later in the meeting.
Software Assessment (11): LB apologised for having made no progress on
this item
Referral of user comments (12): Only one set of comments (Joan Smith's)
had been referred to the committee so far.
5
CURRENT WORKING PAPERS
An updated document register was circulated.
W14 : SGML Bibliography
DTB circulated advance copies of this document, now published as a
Queen's University Technical Report, and briefly described how it had
been produced. A smaller version was to be published in Literary
and Linguistic Computing. There was some discussion as to the
usefulness of continuing to add to the document in its current form. FWT
noted that it might be helpful to distinguish items of relevance to TEI
from items of general SGML interest. While it was generally felt that
several items were of only ephemeral interest, the committee expressed
its appreciation for the hard work done by the editors of the bibliogra-
phy, in particular Robin Cover. It was agreed to review the status of
the document as a means of providing information about current SGML pub-
lications at the next meeting.
W17 : House style for DTDs
There was some debate as to the need for perpetuating the use of ver-
sion numbers. The consensus was that such numbers should be used only
for internal preliminary drafts: the 1992 TEI publication would not have
a version number. DTB noted the absence of discussion of parameter enti-
ties in the document and asked what other topics were missing. LB men-
tioned an article on "The well dressed DTD" in a recent issue of
<;TAG>; and offered to circulate a copy. It was agreed that
the current document should be retired.
W18 : SGML Technical Questions
MSM reviewed his revisions to this discussion document. FWT felt it
had a potentially wider audience than working groups in need of recom-
mendations for dealing with well-recognised thorny problems such as
overlapping segments. DD said that it was a toolbox document not suit-
able for end users. A number of minor revisions were requested (all
attribute values to be lower-cased and all ID values to be quoted; re-
use the M1 action "Nino sits at table" and comment on this possibility;
refer to Hytime work and any papers from TR3; rename x and y to some-
thing less co-ordinate-like such as start and end; check that all exam-
ples parse); the document should be closed once these had been carried
out.
ACTION:
MSM To revise W18
Due: 30 April
W22 : Notes on minimisation
No progress had been reported. The action on JPG continued, with a
revised duedate of 30 May.
W25 : Parser pitfalls
No progress reported. Once MSM had confirmed the availability of the
existing draft from Hewlett Packard, LAP should be requested to produce
a revised draft by 30 June.
W26: Naming conventions
All names (other than entity names) should use lowercase only, with
the possible exception of phrasal names such as "partOfSpeech". By a
narrow majority (3:2) the committee found this a less appalling prospect
than hyphenating such names: wherever possible however phrasal names
should be avoided. Last sentence of para 1 on page 3 should be removed.
The mention of "cartesian products" needed expansion or removal. It was
noted that the current indirection method privileged the English lan-
guage by allowing for only one set of parameter entity names and the
editors were requested to seek guidance from the SC as to the accept-
ability of this, given the project's commitment to language indepen-
dence.
ACTION:
LB,MSM To request guidance from SC as to acceptability of monocul-
tural parameter entities
Due: not specified
The general rule for use of abbreviation should be to avoid it wherever
possible and to use only abbreviations recognised in the field where it
was not. The detailed suggestions in the document should be curtailed.
With these minor revisions it was agreed that the document was complete.
ACTION:
MSM,DD to revise W26 as specified above
Due: 30 April
W30: Transduction examples
No progress on a general document had been made. One example (con-
verting LOB to TEI, using SED scripts) existed and had been circulated
but not fully documented by Nick Duncan. There was some debate as to
the feasibility of producing a general theoretical framework for trans-
duction within the timescale of the project. It was agreed that produc-
ing a number of illustrative examples would be a more realistic goal.
These should document a specific transduction in prose, demonstrating
the adequacy of TEI to represent a wide variety of schemes for inter-
change and giving examples of useful general purpose techniques. FWT
noted that such examples might not always be easy to produce. A toy
(partial translation of a specific text) was easy; a sketch of notes
towards a more general solution was more difficult; a generic clean
solution was probably close to magic. LB reminded the meeting of the
list of target encoding schemes already proposed in ML W12.
After some discussion, it was agreed that nine specific transductions
should be documented in nine separate working papers, as listed below.
ACTION:
DTB To draft W34 documenting the LOB transduction
Due: 30 April
ACTION:
JPG To draft W35 documenting transduction of LaTeX and RTF
Due: 30 April
ACTION:
NI To draft W36 documenting transduction of dictionary examples
Due: 30 April
ACTION:
FWT To draft W37 documenting transduction of Wire service stories
from the Ottawa Citizen
______________
Due: 30 April
ACTION:
LB To draft W38 documenting simple conversion of SGML to LaTeX
with a Spitbol program
Due: 30 April
ACTION:
LB To draft W39 documenting conversion program used for the Dic-
___
tionary of Old English corpus
______________________
Due: 30 April
ACTION:
LB, Harry Weitenberg To draft W40 documenting progress in trans-
lating COCOA formatted dramatic texts using XTRAN
Due: 30 April
ACTION:
DD To draft W41 documenting transduction of TLG format texts
Due: 30 April
ACTION:
DTB To draft W42 documenting transduction used by Gary Simons for
his "Bear goes fishing" example
Due: 30 April
It was agreed that these working examples would provide a more useful
basis for a general solution than the proposed topics of W30, which was
retired.
W31 : ODA
No progress reported.
ACTION:
DD To draft W31 on relevance of ODA to TEI
Due: 30 June
W32: Revision of ISO 8879
FWT noted that illustrative examples should be included in support of
the requirements identified. MSM had more comments to make on the docu-
ment.
ACTION:
MSM To circulate further proposals for inclusion in W32
Due: asap
ACTION:
DTB To revise W32
Due: 31 March
6
CONFORMANCE
The issue of what exactly "TEI conformance" might entail had been
raised in several quarters and was discussed at some length. A document
might use its own simple scheme for representation of elements (a local
storage or capture format); this could be mapped to an SGML conformant
scheme using the corresponding TEI elements. For interchange a distinct
character representation scheme was currently recommended by the Guide-
lines (using the ISO 646 subset). For transmission a further packing or
encoding might be employed. Such discussion of TEI conformance as exist-
ed in the current draft did not clearly distinguish between character
set conformance and DTD conformance. The character set work group would
probably wish to make independent recommendations as to TEI character-
set conformance.
FT asserted, and it was agreed, that the notion of "Tei conformant
software" was meaningless: TEI conformance related only to data. Soft-
ware suppliers could be expected only to specify whether their software
accepted or generated TEI- conformant documents of a specified type.
LB asked what characterized a TEI-conformant element structure: was
it necessary only that it should be parsable with a TEI dtd? DTB said
that there was a required minimal element set (tei.1, tei.header, text
etc), and also noted that use of some extension mechanisms might be pre-
cluded. LB suggested that documentation of the level of tagging used in
a text should also be required. FWT felt that it was unreasonable to
require documentation of what had not been done in a text and might
prove difficult to specify what had. There was some inconclusive discus-
sion of how the presence of non-SGML markup affected conformance. It was
felt that whatever could be simply converted to a TEI/SGML form (e.g. by
using string substitution without look-ahead) should be converted. If a
TEI tag existed for some marked-up feature of a text, then that tag
should be used. DD proposed as a requirement that any non-SGML markup be
documented in the header. FWT and MSM felt that identifying what consti-
tuted "non-SGML markup" was not a simple issue.
No consensus was reached on the topic, except that conformance relat-
ed only to data (rather than to software) and that a more detailed and
clearer presentation of the issues involved was necessary. A working
paper (W43) was assigned to MSM, who proposed to address the following
points in it:
* character set conformance
* changes in the SGML declaration (e.g. to delimiters, or in SGML fea-
tures or quantities)
* variant DTDs using (or not using) the extension mechanism
* documentation of tag usage and of features present in text
* treatment of non-SGML markup
The editors were also asked to convey to the Steering Committee some
sense of the issues involved and to canvas their views.
ACTION:
MSM To draft W43 on Conformance
Due: 30 April
ACTION:
Editors To ask SC views on what TEI-conformance should entail
Due: asap
7
LITERARY WORKGROUP
"Critique"
The content of this document was discussed. It was noted that defini-
tions of technical terms were missing from the current draft Guidelines
and should be supplied. DTB felt that the current draft was less biassed
in favour of descriptive over presentational markup and was also less
prescriptive than the Critique suggested. MSM noted that such bias as
existed had arisen out of the TR committee's requirements. FWT suggested
that the ability to reconstruct the appearance of an encoded original
was a reasonable requirement which should not be overlooked. The commit-
tee felt that the usability of e.g. the alignment mechanisms for multi-
ple literary analyses had been overlooked, and agreed that this and oth-
er general purpose mechanisms discussed in chapter 6 of P1 needed to be
more clearly and accessibly documented.
8
DTD MANIPULATOR
DH presented the work he and DTB had so far done on implementing the
indirection mechanisms for DTD manipulation. Rather than a DTD for DTDs
they had simply generated an indirect DTD from the existing text. The
following supportable facilities and their interactions were discussed:
* rename elements and attributes
* add new elements
* clone an element
* add new attributes to an element
* change content model of an element
* change declared attribute value
* add new global attributes
* change contents of %soup and %broth
DH asked whether comments in the DTD should be preserved. It was agreed
that the present chapter eight would need elaboration. The indirected
DTDs would not be published in the text, but might be distributed elec-
tronically, with a number of tables to create language specific DTDs.
The indirection algorithms would be documented and C source code provid-
ed.
ACTION:
DTB To provide DTD manipulator as outlined above in electronic
form
Due: 30 April
9
DATE OF NEXT MEETING
It was agreed that no date should be set for the next meeting until
some of the assigned work papers were completed. A meeting in late sum-
mer or early autumn was likely.
Draft April 26, 1991 (16:48:22)