For many of the first users of SGML, the appropriate answer was ``One'': the whole purpose of the exercise being to define a template against which all texts could be checked rigorously and consistently. This approach, which might be characterized by the phrase ``we know what's best for you'', has an obvious place in applications such as technical documentation, but is equally obviously inappropriate where the object of the exercise is to describe texts produced before the blessings of structured document design were revealed to the world.
At the opposite extreme are those whose answer would be ``none'', for whom no DTD can ever be adequate to the full complexity of the texts to be described: this attitude might be caricatured as ``No-one will ever understand my problem''. Again, it is not impossible to imagine applications for which a DTD consisting only of elements with the content model ANY would be entirely appropriate (the first electronic edition of the Oxford English Dictionary provides one obvious example), although its usefulness in the general case is less clear.
Perhaps most numerous are those who shrug their shoulders and say ``as many as it takes'': the world will always need new DTDs, in the boundary case, one per document. In the name of pragmatism, this attitude risks crowding the fledgeling possibility of information interchange out of the nest entirely; nevertheless, its popularity reminds us that sometimes the document must drive the DTD, rather than the reverse.
The approach taken by the TEI attempts to combine virtues of all three of these approaches. It defines not one, but many possible DTDs, which may be tailored to the needs of a particular application in a way difficult or impossible with most other general purpose DTDs so far developed. The user of the TEI scheme is offered the opportunity of building a DTD which matches his or her requirements, but constrained to do so in a way that facilitates interchange.
We refer to this somewhat jocularly as the Chicago Pizza model. All pizzas have some ingredients in common (cheese and tomato sauce); in Chicago, at least, they may have entirely different forms of pastry base, with which (universally) the consumer is expected to make his or her own selection of toppings. Using SGML syntax this might be summarized as follows:
<!ENTITY % base "(deepDish | thinCrust | stuffed)" > <!ENTITY % topping "(sausage | mushroom | pepper | anchovy ...)"> <!ELEMENT pizza - - (%base, cheese & tomato, (%topping;)* )>In the same way, the user of the TEI scheme constructs a view of the TEI DTD by combining the core tag sets (which are always present), exactly one `base' tag set and his or her own selection of `additional'tag sets or toppings.
We use the term tag set to denote simply a collection of definitions for SGML elements and their attributes. These tag sets are the basic organizing principles of the TEI scheme, and are divided into four groups:
This modularization is achieved by the use of parameter entities in the TEI DTD, which is further discussed below. To illustrate the basic mechanism we present here the start of a minimal TEI-conformant document in which the base tag set for prose has been selected together with the additional tag set for linking:
<!DOCTYPE tei.2 [ <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % TEI.linking "INCLUDE"> ]> <tei.2> <!-- content of document here --> </tei.2>Because this selection of tag sets is effected explicitly by declarations within the DTD subset, as shown above, any recipient of the document can tell which TEI tag sets are required to process it. Any deviations or modifications of the TEI definitions (for example, the renaming of elements, or the addition of new ones) may be made in a similar declarative manner. Once a given view of the TEI dtd has been defined in this way, it can be fixed or `compiled' to preclude further modification and also to remove the complexity necessarily introduced by the extensive use of indirection in the TEI dtd.