3 Bare-Bones SGML

SGML, the Standard Generalized Markup Language, is a formal language for representing text in electronic form. The TEI tag set is defined in terms of SGML, and all TEI-conformant documents must also conform to SGML.

In SGML-based encoding schemes, a document is represented by a combination of content (roughly speaking, the characters of the text, what you see on a printed page when the text is printed out) and markup (roughly speaking, information about the structure of the text, or features important for proper processing of the text, such as its division into chapters and sections, or the fact that a given phrase is a technical term and must be italicized). Non-SGML software, such as proprietary word processors, uses a similar division into content and markup. In sophisticated software, markup is usually invisible to the user unless you use a reveal-codes function or the like, to make it visible. SGML differs from proprietary markup systems in several ways:

There are other differences, but these will do for now.

SGML markup takes three forms: declarations, entity references and tags. [I cannot tell a lie: actually, there are four forms of markup. The fourth, processing instructions, won't concern us here.]

Declarations are used to define the tags and entity references which are legal in a document type. Since the tags and entities we are concerned with here have all already been defined by the TEI, there is no need to discuss declarations further in this document. You will need to learn about them if you want to customize the TEI tag sets, but that won't be covered here. The only form of declaration you need to know about, to follow the examples below, is the comment, which is preceded by <!-- and followed by -->:

 
<!-- this is a comment. -->
<!-- this is a second comment. -->
<!-- Comments are ignored by the SGML parser,
and usually ignored by SGML software
of all types.  As this comment shows,
comments can go on for several lines. -->

Entities are named portions of documents, which may be stored separately; entity references show where each entity goes. Among other things, entity references are used to embed special characters in the text when, as often happens, the characters in question are not available on the keyboard. Some entities for special characters are defined in international standards. For example, the entity eacute names the character "e with an acute accent" (é). When the standard entity sets are in use, the following two examples are identical in meaning:

 
L'&eacute;tat, c'est moi.
 
L'&eacute;tat, c'est moi.

(In case this has been corrupted in transmission, or is being rendered on a device without accented characters, that second one is the same as the first, except that the reference to the entity eacute in characters 3-10 of the first example has been replaced with a real e with an acute accent in the system's native character set in character 3 of the second example.)

Entities are also used to handle graphics and other material in non-SGML notations, and to divide a document up into sections stored in separate files for purposes of simpler maintenance, but we won't discuss such uses here.

Tags mark the beginning and ending of parts of the document; the parts themselves are called elements. Normally, tags are marked in the document by angle brackets; end-tags have a slash after the opening angle bracket. In the following example, the sentence is marked as a quotation by the start-tag and end-tag which surround it; quote is an element type defined by the TEI.

 
<quote>L'&eacute;tag, c'est moi.</quote>

Elements always have a basic type (in the example above, it is quote); they may also have other attributes, which are indicated by special notations inside the start-tag for the element. For example, the TEI defines the attribute lang as applying to every type of element; its value indicates the language of the element's content, using standard two- or three-letter abbreviations (e.g. fra for "French").

 
<quote lang='fra'>L'&eacute;tat, c'est
moi.</quote>

Some attributes may be restricted to certain types of values. Attributes of type id, for example, must provide a unique name or identifier for the element on which they appear; this identifier can be referred to by other attributes, of type idref ( id reference). The TEI defines a global attribute named id, of type id, for use in cross-references and other kinds of hypertext links. (TEI attributes are called global when they apply to every type of element.)

Finally, it should be noted that SGML allows some tags to be omitted from documents, in cases when they are logically redundant and their location can be inferred from that of other tags; in the examples given here, we will not exploit this facility, but always give all tags explicitly. Tag omission is generally of interest only to those working without an SGML editor.

In sum: in SGML, everything is delimited.

That's all there is to it. If you understand the rules just described, you should have no trouble understanding all the SGML examples in this document.