The Colonial and State Records of North Carolina project (CSR), based at the University of North Carolina at Chapel Hill, will enter its fourth and final year this summer. The project's goal has been to digitize and publish online the 30-volume collection (26 volumes of documents and a 4-volume index). The print version of the CSR has been a vital resource for historians of the American South, genealogists, and other researchers since it was first published. The emergence of this resource as a freely available and searchable online collection is therefore quite significant. The CSR has 5 logical divisions:

  1. Volumes 1-10, the Colonial Records (1622-1776; bulk 1662-1776)
  2. Volumes 11-22, the State Records (1776-1790, with additional colonial-era documents added as appendices)
  3. Volumes 23-25, Colonial and State Laws, including an index to those laws in volume 25 (1670-1790)
  4. Volume 26, the 1790 Census, including an index to the census
  5. Volumes 27-30, a cumulative index to volumes 1-25, including a "Historical Review" in volume 30

The Colonial and State Records of North Carolina were begun and edited by William Saunders and published between 1886 and 1890, the State Records were edited by Walter Clark and published between 1895 and 1907, and the indices were compiled by Stephen B. Weeks.

This paper will discuss the implementation of the project, which uses several innovative techniques, and several of the issues the team confronted and solved. The volumes have been digitized as P4 TEI Lite, the indices have been converted to P5 TEI Lite, metadata for each document has been encoded in METS, and the authority lists for the volumes in MADS. The volumes, indices, and metadata are stored in an eXist XML database, with a PHP 5 front end that allows browsing and querying of the XML and communicates with eXist via a REST API.

One of the most basic questions that the team had to answer was: “What is a document?” In this collection, a document is defined as anything written or printed that existed independently of The Colonial and State Records. Essays and notes written by the editors are also considered documents, even though they may not have existed outside CSR. A document such as a personal letter, for instance, might have been created in one day by one person; something like a legislative journal, on the other hand, might have been created over several days by a group of people. Some documents cover over a hundred printed pages, while others are so short that five or six were printed on the same page. Each document is described and presented as a distinct item, regardless of its length.

These huge differences in document scope were just one of the many information design challenges the CSR presented. Other goals of the project include the development of teaching resources (Learning Objects, etc.) that use the collection and the release of documentation and code developed during the process with a view to helping projects that may face similar challenges.