When you run into a situation that you don't know how to tag,
we would suggest that you first consult the guidelines below,
and if the information is not covered here, then look in the TEI
Guidelines (particularly volume 2). It is also helpful to look
at some of the verses using Panorama with the tags on (ctrl T)
that are already online. If you do so, be aware that different texts handle the same situation differently. You may need to makes some choices. If after trying the above you still can't resolve your question, then talk
to one of the HTI staff.
title page <titlepage> | dedication <div type="dedication"> |
title <doctitle> | preface <div type="preface"> |
author <docAuthor> | acknowledgments <div type="acknowledgment"> |
part title <titlepart> | epigraph <epigraph> |
Bylines:
The <byline> tag refers to the primary statement of responsibility
given on the title page or at the end of a work. When the author's
name is given, use <docAuth> within the byline. For instance:
<byline>By <docAuth>Louise Chandler Moulton</docAuth></byline>.
Note: If dedications or acknowledgements are part of the titlepage (or another div) they should not be tagged as separate divs.
Title pages are not always scanned, so doublecheck when you are
encoding front matter that all title page information is present.
Some information such as title and author may have to be typed
in by hand.
Wherever the page numbers appear in the printed text (top, bottom or even side of page), the page break element appears at the top of the text for that page.
If you can determine the page number of un-numbered pages by counting backwards from the first page number given in the text, you should do so and assign page numbers. Place them in square brackets to differentiate them from the page numbers actually present in the text. For example, if the introduction begins on page vi, you can assign the numbers i-v to the pages that preceed the introduction.
The titlepage and verso are never assigned page numbers. The word verso should appear in place of a page number on the verso page.
Sometimes the pagination is wrong in the original text. Leave in the page numbers of the original text and insert a note in the header that the error was noted and the original pagination was preserved.
Page numbers should appear inside a div. For example, do not put a page numbers between two div1s -- put the page number inside the div that begins where a page break occurs.
Epigraphs (quotations that preceed entire texts or individual poems) often carry with them bibliographic information, such as an author's name and/or a title. Enclose these in a bibl tag. If the parts of the bibliographic information are differentiated by typeface (for example the title in bold), you should label the parts separately within the bibl tag.
"Is it better to suffer the slings and arrows of outrageous fortune":
<bibl> Shakespeare, Hamlet </bibl>
BUT
<bibl> <author> Shakespeare </author> <title> Hamlet</title> </bibl>.
An opener groups together phrases appearing as preliminary matter in prose, poems, and especially letters. Use <opener> when a date and/or place is given at the beginning of a poem, introduction, or prefatory material like a foreword or preface.
Don't break down <opener> into the subelements of dateline, byline, and saluation if those would be applicable. In Middle English texts, the openers often take the form of a sentence-long summary of the text that follows, and they may be in Latin.
A <closer> groups together the same elements, but at the
end of a division. Again, don't break them down
into the various subelements.
The <trailer> element contains a footer for a division. In the Middle English texts, a prayer (generally in Latin) at the end of a div in a religious text or "Here endeth cap. XXI" in a secular text would be trailers.
Main title with subtitle:
<head>The Red Wheelbarrow</head>
<head type="sub">A meditation on chickens </head>
Main title with a roman numeral: contain within the same head tag with a line break <lb> after the numeral.
<head>IV<lb>Pine Trees</head>
Main title with prefatory material:
<head>Mending Wall</head> <opener>To my lovely wife</opener>
<head>Howl</head> <opener>written while crossing Lake Champlain</opener>
<head>The Walls Do Not Fall</head> <opener>April,
1943</opener>
In the Middle English texts, there may not be heads for each obvious division. If the editor has supplied a head (as seen as running text at the top of the page or perhaps a marginal note summarizing the action), or if there is a table of contents for the manuscript where heads have been indicated for each chapter but are inconsistently included, add a type attribute of "supplied" to the head element. For example,
<head type="supplied">Cap.ix.<br>In which King Ban slays the white hart.</head>
DIV elements are numbered to reflect their level in the hierarchy (e.g., the DIV0 might be a section which contains a group of DIV1s, each a poem) but are named on the TYPE attribute to reflect their class or function. For example, a poem may be at any of several different levels (DIV0, DIV1, DIV2, etc.) depending on its place in a hierarchy, but will always be TYPE="poem".
The TYPE attribute for a DIVn should, if possible, reflect its named value. This is frequently something like "chapter," "dedication," or "introduction." At other times, it may be more generic and unnamed by the author or printer. For the American Verse Project, we have chosen to name higher level
Tag all poems as <div(insert appropriate number) type="poem">.
Line group <lg>
verses <lg type="verse"> The default type for complete line groups will be verse.
If partial verses appear in prose, do not tag
as <lg> unless you are sure it is a complete line group. Tag as quote <q>
with lines <l>.
Note: If a poem has only one verse, mark up only the
lines and do not tag it as a line group.
Each line <l>
Delete any line breaks created in the text solely because of
the width of the page.
Preserve indentation at the level of the line, as it appears in the text except in the cases where it seems to be supplied by the printer solely for indicating continuity in the line (when a line of poetry takes up more than one line on the page). Indicate the indentation on the element attributes, i.e. rend=indent.
Sometimes it is clear that a line of poetry is deliberately incomplete. You will notice this particularly with poetry that has a clear metrical scheme. For instance in dramatic verse, there are often iambic pentameter lines that are started by one speaker and finished by another. Such lines should be tagged as lines but given the attribute part. In most cases the part will be initial [i] (the beginning of a line), medial [m], (the middle of a line), or final [f] (the end of a line).
For example:
Ribera We are the fools of habit
Ribera<li part=i> We are the fools of habit
Block quotes, letters in prose sections, and poems in prose sections
are tagged as quotes <q> and can then include lines, paragraphs,
etc.
Note: you do not need to mark up all indirect quotes.
The most common instance for using the tag <milestone> would be for the occurance of a line of asterisks denoting an ambiguous section break or a possible missing line. Delete the asterisks and insert the tag <milestone unit=typographic n="******">. For an exact representation, use the same number of asterisks, with single spaces between them if spacing occurs, that are in the original.
Do not use the milestone element for typographic elements that indicate the end of a natural division. For example, many texts have a centered hairline between poems, or between the head of a poem and the beginning of the first linegroup. These are not typographically or intellectually significant and are not encoded.
Not infrequently, you will find the location of the milestone is ambiguous because the editor or printer has forgotten to insert the asterisk or number locating the milestone. Sometimes this is simply because the milestone goes at the beginning of the line on which it falls but sometimes not. When it is ambiguous, insert it where you believe it goes and make a note of it in the editorial declaration.
When words are broken by milestones, join the word and place it before the milestone and note these instances in the editorial declaration with a slash designating the break.
In American Verse, brief footnotes and endnotes, especially when there aren't too many of them, can be simply moved into the line and tagged with a note tag. So if the original is something like:
<head>RIP</head>
<opener>For my sister*</opener>
and the note is:
*died Sept 18, 1812
You can simply tag it as <opener>For my sister</opener><note>died Sept. 18, 1812</note>
If the notes are contained in a section that does not come at the end of the text, but appear in a place where removing them and placing them inline will leave a gap in the numbering sequence, encode the section as data and do not worry about linking the notes to the appropriate text.
In the Middle English texts, notes that indicate changes made to the manuscript, either by the scribe, by a later scribe, or by the editor are not encoded as notes. We use the elements in the TEI that identify these items specifically. Common elements used are supplied, add, del, sic, and corr. Supplied is used when the editor has supplied text, generally from another manuscript, generally due to damage and missing text in the manuscript used for the edition. The add element is used for situations such as "h is added in a later hand"; del is used similarly for mentions of deletions. Scribal errors in the manuscript, often noted as "Ms. reads "hir hir stede.", are encoded as <corr sic="hir hir">hir</corr> stede.
White spaces around punctuation such as em dashes, colons, and
periods should be deleted. (Postprocessing routines will take care of spaces around punctuation automatically.) Note: if you think the space
should be reinstated in some instances, do so on a case by case basis.
You don't need to worry about two characters spaces together as
the representational tools dealing with SGML will automatically
collapse them into one space.
When you find an error in the original text, suggest a correction, while leaving the original intact, as follows: <sic
corr="write">rite</corr>.
If the error is due to HTI scanning and is not in the original
text, then simply correct the error.
Do the same for Middle English texts where the breaks occur at the end of folios (that is, a milestone tag occurs in the middle of a word). However, note that you have done this and list the words joined and their locations -- chast/itie (folio 49r) -- in the editorial declaration in the TEIheader.
Sometimes Middle English words have hyphens in them as normal practice, with-outen, for example. In these cases the hyphens should be retained.
If there are 2 sets of copyright information; 2 sets of publication
information; information in the front regarding the printer of
the edition; or 2 title pages, these can be handled with the tag
<imprint> or <docimprint>
If you run across other oddities and have developed a solution, please pass the information on so that it may be added to the style guide. If you run across other oddities and can't resolve them but think that they are important to be covered in our guidelines please find or mail Chris Powell (sooty@umich.edu).
Many of the texts the Humanities Text Initiative works with include images. For example, a book of American poetry might include a portrait of the author in the front of the book, and it may also contain other images related to specific poems. When the electronic text is completed, it will include the appropriate images from the original text. One person at HTI manages all of the tasks associated with images, but these guidelines should give anyone at HTI the basics about our imaging process.
Title page and verso of the title page. Every electronic text created by HTI includes an image of the original title page and its verso. This is useful for readers who may want to have access to this information, but HTI's primary reason for including these images is so that the library's catalogers will have access to this information for the creation of AACR2 conformant headers for the electronic texts.
Any other images. HTI electronic texts include all of the images found in the original text. These may include portraits of the author, handwriting facsimiles, engravings, or any other kind of graphic element (See Example 1).
Entities. Each image in an electronic text may be considered an SGML entity, just as special characters are entities. Entities are declared at the beginning of the text in the SGML header, and they include the image's file name. (See example 2). This entity declaration alerts the SGML parser to the presence of an image that will accompany the electronic text, and it also tells the parser where the image is located for retrieval.
Entity references. The body of the electronic text does not contain the image file itself; instead, it contains a pointer to the image file. Pointers are inserted into the appropriate place in the SGML text as entity references. (See example 3). When the parser encounters an entity reference in the text, it refers to the matching entity declaration in the text's header and then retrieves the image and displays it at the appropriate place in the text.
where<figure page number>
page number
is the actual page
number in the text where the figure is located (See example 4). It may
be helpful to make a note to
yourself as to what pages they are found on in the text. This is
helpful later if a four-hundred-page text has only two images, and
you need to remember where they are.
hti-imaging@umich.edu
Include the title and author of your text.Find Andrew Midkiff, or send email to amidkiff@dnsumich.edu
<img src="bell.gif" | This image will be sacnned, and named pierp-bell.gif. |
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite 1.0//EN" [ <?STYLESPEC "UM HTI American Verse" http://www.hti.umich.edu/sgml/panorama/amverse.ssh"> <?NAVIGATOR "UM HTI American Verse" "http://www.hti.umich.edu/sgml/panorama/amverse.nav"> <!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN"> <!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN"> <!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN"> <!ENTITY % ISOpub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN"> %ISOlat1 %ISOlat2 %ISOpub %ISOnum < !NOTATION gif PUBLIC "+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe Graphic Interchange Format//EN" "GIF" > <!ENTITY pierp-bell SYSTEM "images/pierp-bell.gif" NDATA GIF> ]>
<DIV0 TYPE="poem" ID="DIV0.11"> <HEAD>THE LIBERTY BELL</HEAD> <P><FIGURE ENTITY="pierp-bell"></FIGURE></P> <LG TYPE="stanza" ID="LG94"> <L ID="L551">THE LIBERTY BELL—the Liberty Bell—</L> <L ID="L552">The Tocsin of Freedom and Slavery's knell</L>
As new works are completed (i.e., marked up, proofed, markup reviewed,
and associated images created), they are added to the collection.
This process includes two principal steps: adding links from
the Browse page,
and re-indexing the collection to facilitate searching the new
texts and displaying them in HTML.
Most files will be in Netware and will need to be exported from
Author/Editor. Some will occasionally be on the HTI UNIX servers
(possibly saltmine) as normalized SGML. In the case of files
in Netware,
Created by Jason P. Williams for the Humanities Text Initiative on April 17, 1996. Last revised by Chris Powell January 29, 1998.
Put the files on the HTI Web server
Edit the index.html file in /web/english/amverse/texts
Re-indexing the corpus