Physical Bibliography - Draft for P5
Contents
- The collation element
- The collationFormula element
- The codexStructure element
- "Milestone" tags for book-structure
- A stand-off markup strategy using milestone tags
- Physical structure as the primary hierarchy
This module defines elements that can be used to encode the physical structure of books and manuscripts, either in order to provide a higher level of bibliographic detail or more structured encoding of bibliographic facts than allowed by the TEI Header or the Manuscript Description module, or in order to associate transcribed text or images of pages with an encoding of the physical structure of the book from which the transcription or images are taken. Two kinds of tags are provided to supplement the standard provisions of the <sourceDesc> section of the TEI header, those that allow encoding of bibliographic formulae, that is, standard or project-specific systems of representation or notation of the physical facts of books or manuscripts, such as the "collation formula" refined by Fredson Bowers, and those that permit direct encoding of the physical facts themselves. In addition, tags are provided to enable book structure to serve as the primary hierarchy governing the encoding of the text itself, and tags and a stand-off markup strategy are provided for users who must choose another kind of TEI hierarchy as their primary one in order to capture the textual features that are of interest to them, but who also wish to encode the physical structure of the source as an aspect of their text encoding.
The collation element
The <collation> element will appear within the <msDescription> or <bookDescription> elements in the <sourceDesc> section of the TEI header. It can contain a collation formula and the other elements that form a full bibliographic description following the Bowers notation (or some other standard or project-specific collation formula); or a paragraph-form description of the structure of a book or manuscript. It can also contain a full formal representation of the structure of the book itself using <codexStructure> and other tags defined below.
-
<collation> contains a prose description of the collation of a single book or manuscript, a collation formula expressed in some standard or project-specific notation, or a full formal representation of the structure of the book or manuscript using tags defined in this module.
The <collation> element has one or more of the following components:- A prose description of the collation of the book or manuscript
- <collationFormula> contains the elements that make up a full bibliographic formula description of the book or manuscript: a list of <gatherings> , an indication of the total number of leaves in <totalLeaves> , and a <pagination> statement.
- <codexStructure> contains a complex of elements that directly describe the full physical form of a printed or handwritten book, such as <gathering> , <leaf> , and <page> .
The collationFormula element
- <gatherings> contains a listing of the gatherings that make up the book or manuscript, divided into instances of the element <gatheringRange> , plus (optionally) a record of the alphabet used to mark signatures (in <signatureAlphabet> ) and (also optionally) a record of the leaves on which signature marks appear (in <signatureLeaves> and <anomSignature> ).
- <totalLeaves> : contains a number specifying the total number of leaves in the book that is being described in the formula.
- <pagination> : for books with printed or hand-written page numbers, contains a list of the ranges of consecutive numbering, appearing as separate instances of <pageRange>
- <gatheringRange> : Allows the encoder to specify the start and end gatherings of a range of gatherings that have the same physical construction (using <start> and <end> to enclose the signature letter or the sequential number of the first and last gathering of the sequence, and <added> , <cancelled> , or <missing> to identify anomalies within the sequence).
- <signatureAlphabet> : Used for specifying or describing a particular alphabet used for signatures, such as the 23-letter printers' alphabet used to sign British and some American letterpress-printed books.
- <signatureLeaves> : Used to specify the first ( <start> ) and last ( <end> ) leaves normally signed in each gathering.
- <anomSignature> : specifices a signature that is additional to the norm specified in <signatureLeaves> or that would be expected but is not present, using the attributes type="added"" and type="missing".
- <pageRange> : Allows the encoder to specify the start and end pages of a sequence of pages that are continuously paginated in regular and uninterrupted succession (using <start> and <end> to enclose the page numbers of the first and last pages of the sequence). In the case of interrupted or repeated sequences of page numbers, multiple instances of <pageRange> may be used to show all continuous sequences or stray individual page numbers.
The codexStructure element
The <codexStructure> element encloses a complex of elements that together describe the full physical form of a printed or handwritten book, such as <gathering> , <leaf> , and <page> . In the case of multi-volume works, <codexStructure> may be repeated for each volume.
- <gathering> : represents a quire or gathering in the souce material, that is, a unit consisting of pages constructed by folding and attached (or formerly attached) to the spine of the book. Sub-elements and the relations between them formed by pointer-type attributes (and also by sequence in the XML file) may indicate the many physical and spatial relationships that exist between leaves and pages in the gathering and in the sheet(s) as printed before folding and assembly into the completed gathering.
- <leaf> : this element represents an individual physical leaf in the source material. The attribute conjunct may be used to refer to the xml:id of the <leaf> element representing the leaf that is conjoined to this leaf at the spine of the book. The attribute "signature" may be used to record the signature letter/number printed or written on this leaf. The attribute "sheet" may be used to assign the leaf to one of the sheets folded together, if more than one sheet is involved in the construction of a single gathering.
- <page> : this element represents an individual physical page (one side of a leaf) in the source material. The attribute "no" may be used to record the page number printed or written on this leaf. The attribute "cutFromN" may be used to refer to the xml:id of the page that before folding into the gathering and cutting of leaves was attached to the top (North) end of the current page; similarly "cutFromS", "cutFromE", and "cutFromW". The attribute "W" may be used to refer to the xml:id of the page to which the current page is attached by its leftward edge; similarly "E", and for books with the spine at the top or bottom of the pages, "N" or "S". The attribute "sheetSide" may be used to assign the current page to one or other of the surfaces of the printed sheet before folding into a gathering.
"Milestone" tags for book-structure
Note: these tags replace <pb/> , <cb/> , and <lb/> tags included in previous editions of these Guidelines.
- <newGathering/> marks the place in the text where a gathering or quire begins in the source document.
- <newLeaf/> marks the place in the text where a leaf begins in the source document.
- <newPage/> marks the place in the text where a page (that is, one of the sides of a leaf) begins in the source document.
- <newColumn/> marks the place in the text where a written or printed column begins in the source document. Depending on the arrangement of the source document and the wishes of the encoder, <newColumn/> may be used once to mark the beginning of the transcription of each column in the source document, or may be invoked multiple times in order to signal the changes of column within each line of a text arranged in tabular form.
- <newLine/> marks the place in the text where a physical line of text (as distinct from a conceptual verse line, to be marked with <l> ) begins in the source document.
A stand-off markup strategy using milestone tags
The physical structure of a book can be conceptualized as a series of hierarchically-organized objects, such as gatherings which contain leaves, and pages which contain lines of text. For some encoders, especially those with strong bibliographic interest and those preparing electronic transcriptions of manuscript or print materials, the physical structure hierarchy will be the primary one, and tags are provided elsewhere in this chapter to facilitate such a choice of primary hierarchy. For many other encoders, the rich resources of these Guidelines for encoding conceptual textual hierarchies such as chapters, sections and paragraphs are important and a primary hierarchy other than physical book structure must be chosen. The situation arises so frequently that a researcher using another TEI hierarchy as her or his primary hierarchy also wishes to encode the book structure hierarchy in the same file that special provision is made here to facilitate this in addition to the resources offered in Chapter 31, Multiple Hierarchies.
The mechanism described here creates a kind of within-file "stand-off markup" in which information about the book structure hierarchy is kept separate from the encoded text but is linked to the book-structure milestone tags within the encoded text. Reference from the encoded text to the elaboration of book structure in the <codexStructure> section of sourceDesc is by means of pointer-like references to the xml:id attribute of instances of the <page> element in <sourceDesc> , references which occur within the <pageID> attribute of the empty milestone element <newPage> .
Physical structure as the primary hierarchy
- <physGathering> contains the text that occupies a physical gathering or quire in the source material.
- <physLeaf> contains the text that occupies a physical leaf in the source material.
- <physPage> contains the text that occupies a physical page (that is, one side of a leaf) in the source material.
- <physColumn> contains the text that occupies a physical column in the source material.
- <physLine> contains the text that occupies a physical line in the source material.