At the ENS Lyon, various teams have chosen TEI to represent their data - corpora, critical editions, etc., with the benefits we know in terms of expertise sharing, strength of XML technologies, long term preservation ... However, the use of the standard is not sufficient to avoid a certain complexity when projects are growing : with the issue of file versioning (which can of course always be solved by an ad hoc software) appear issues of dependence between these versions : for instance, when a XSL file has been modified, which data files have to be reprocessed? What are all the other operations to be launched? This complexity is increased by the problems of authentication and authorization, and by the great variety of software tools involved in the production, exploitation, visualization and / or publication of these data. As part of a sharing policy within the ENS, we develop ‘amalia, a solution based on the platform eSciDoc to address this problem of complexity. eSciDoc is a data management system which provides functionalities to organise data, associate metadata to data, manage versioning, and control access. ‘amalia is a web application which elaborates on eSciDoc and allows the definition of data workflows, i.e. chains of transformations and/or actions operating on data. The actual realisation of the actions orchestrated by ‘amalia is not constraint and may rely on web service, local script, … ‘amalia can be conceived as a glue between various subsystems where the data they work on is ultimately managed by eSciDoc. We will propose a demonstration of ‘amalia and a feedback on the challenges and benefits of this solution in our own experience. We will first show the first workflow we defined, which was for the project Hyperdonat. This workflow transform from ODT to XML TEI and from XML TEI to HTML and index the XML TEI in the XML database BaseX. We will also show other workflows at various stages of development, and involving other third-party systems such as Dinah.
Bibliography:
The project which we would like to present is a project which originally evolved out of three different types of challenge:
The project was carried out together with students of Romance Linguistics in collaboration with a student of Business Information Systems and an external designer. The portal which has been created in the framework of this project is supposed to be not only a means of presenting the project results, but also to function as a teaching tool and as a test bed for further developments. Although the project is ongoing, the portal which is available at http://www.culingtec.uni-leipzig.de/JuLeipzigRo/ can be used as it stands. The portal through its four levels tries to reflect the process of scientific research and the writing of an academic paper. As all four levels as well as the data they contain are interlinked, the portal mirrors also the circularity which is characteristic of academic paper writing in the digital age. The poster will explain the four levels which make up the portal (sources, extract from sources, ontology, texts). Particular attention will be given to the Digital Humanities components, i.e. the TEI markup of primary texts and the bibliographical database, as their integration in traditional modules of Romance linguistics foster the conceptual change which is needed if computer technologies are to be exploited meaningfully when doing research and writing an academic paper.
Bibliography:
This poster will describe and demonstrate work done in creation of an online archive for research into the Wandering Jew’s Chronicle (WJC). The WJC is a printed ballad published between 1634 and circa 1820 which survives in 22 known copies of 15 editions. These are held in ten libraries in Britain and the USA. The ballad itself outlines the succession to the throne of England from William I to a variable contemporary monarch depending on its date of publication. More specifically these are from the reign of Charles I until that of George IV, taking in seven monarchs in continuations from a core text. The succession of these monarchs is narrated by the supposedly immortal Wandering Jew of European legend. There is immense scholarly interest not only in the subject matter, but textually in the pattern of variations, the length and breadth of its publication and distribution. For a digital humanities perspective the textual history and relationships pose interesting problems for collation and textual analysis. Each one of the editions inherits a basic core text: some of these editions incorporate common continuations or variations, while others are textually idiosyncratic. The editions of the WJC are not only textually but also graphically interesting as most of the editions are illustrated with woodcuts of the monarchs described, and while some editions share woodcuts in common others employ copies or individual illustrations. The poster and demonstration will introduce the benefits of having gathered all the material relating to the WJC in a single place while demonstrating the technologies used to create the research archive. Surviving copies of the WJC are scattered: variously held in the Bodleian Library; the British Library; Cambridge University Library; Magdalen College Cambridge, Pepys Library; The University of Texas at Austin Harry Ransom Library; and the Brown University Library. The WJC research archive has created a digital archive in which surviving editions are united under a single authoritative citation and represented by: - archival quality images - transcriptions marked up in TEI P5 XML - tools for comparing variations between texts - bibliographical metadata - scholarly commentary It is hoped that by providing all of this in one location research into WJC can flourish in new and interesting ways. This resource helps to foster digital humanities research through tracing and expressing bibliographical, textual and iconological relations across a corpus of copies, variant editions, and versions of ballad texts, including their images and tunes. It is a valuable resource for those researching textual genealogy in the earlymodern period. It will impact the research of scholars of folklore, balladry, historiography, book history and textual studies.
Bibliography:
This poster presentation aims to introduce our digitization project »Dingler-Online« to the TEI community. The project is located at the Institut für Kulturwissenschaft, a department of the Humboldt-Universität zu Berlin and sets out to digitize »Dingler’s Polytechnisches Journal«, which was originally published between 1820 and 1931. Aside from the digitization of the journal’s images, we encode the OCRed text according to the TEI guidelines TEI-P5. At the moment (April 2011) 219 books – which is around 220 million characters – are freely available on the web via www.polytechnischesjournal.de/. Due to its technical background a very important element of the journal are patent applications. At this time (April 2011) our patentlists cover more than 24.000 entries. Hence we are dealing with a great collection of data-sets, which are both compact and expansive in the sense, that each single entry is rather short, but still contains persons, titles, places and dates, that again may point to other sources. Therefore apart from the patentlists’ importance for research activities, these lists are indeed a great challenge for thorough TEI-tagging. This refers both to conventions for our service provider and to our semi-automatical encoding in which we make extensive use of xpath queries and regular expressions. Furthermore our encoding of patentlists is the basis for different visualization approaches, which aim to make our edition a user-friendly platform inspiring a broad use not at all restricted to researchers but open to an interested public in general.
Bibliography:
Opera editing so far has been undertaken either under a musicological point of view – focussing on the musical score – or from the literary scholar’s standpoint – considering the libretto as a literary text genre. This ‘splitting‘ is the result of the specific structural organization and historical transmission of each of the two textual systems. An eighteenth century libretto, for instance, is structured by scenes and verse lines, while the musical score of the same piece is organized by musical numbers and bars. In addition to this, the ‘literary’ and ‘musical’ text traditions of one and the same work of the lyric stage often are anything but congruent, which is, of course, due to performance and adaptation practices, but also to the fundamental differences between the two text types. On the other hand, it is evident that an item of music theater – a vast generic field which includes not only ‘normal’ opera composed from one end to the other, but also more ‘complicated’ genres as e. g. for comic opera, Singspiel, operetta, drama with incidental music, melodrama, or ballet – comes down to the editor not only through musical sources, but by a (mostly large) bundle of literary and musical ones. OPERA aims to handle the diversity of the source types in a way that on the one hand includes all relevant source types and considers them of equal value, and on the other hand avoids mutual contamination. OPERA editions are presented in the form of hybrid editions with the scores being published in printed volumes, while the text editions and critical reports are in an electronic form visualizing the sources and establishing a complete interlinking of the editions with the sources and critical reports. For this purpose, OPERA uses the music edition software Edirom, which has been developed by the Edirom project at Paderborn University and recently was essentially expanded in regard to the incorporation of text editions. The poster and tool presentation gives an insight into two editions OPERA currently works on: the Italian opera Prima la musica e poi le parole (Vienna 1786) by Antonio Salieri (music) and Giambattista Casti (text), edited by Thomas Betzwieser and Adrian La Salvia, respectively, and the French opéra comique Annette et Lubin (Paris 1762) by Justine Favart (text) and Adolphe Blaise (music), with the text and music edition being prepared by Andreas Münzmay. These examples highlight the editorial, programming and encoding strategies ensuring the integration and interlinking of TEI-based text edition, score edition, source images, and critical apparatus. Among the features that have been developed to this end is a navigation tool which allows to call up whatever bar of the score or whatever verse line of the libretto and visualize synchronously any corresponding textual and musical object. Likewise, one common critical apparatus comments on both the text and music edition, and is accessible from either of the two directions. For this purpose, it is necessary in turn to incorporate musical structure information such as bar numbers, and link information with respect to the source images and critical apparatus entries into the TEI-based text edition section of an edition’s XML source code.
Bibliography:
TXSTEP offers an interactive XML-based interface to the proven and powerful routines of TUSTEP, the Tübingen System of Text Processing programs. For more than 35 years TUSTEP is being developed and maintained at Tübingen University's computing centre. TUSTEP is a scripting language as well as a publishing system for the humanities, up until today unmatched in it's overall performance and flexibility. TUSTEP primarily addresses users in the fields of the textprocessing humanities, such as computerlinguists, -philologists and editors. For more information, see www.tustep.org. But, since it's genuine syntax is proprietary, not intuitive and supposed tobe difficult to learn, users tend to help themselves with other - often less effective - tools or less specific programming languages. TXSTEP now gives a good answer to this situation by providing a user-friendly XML-syntax, allowing beginners and advanced programmers to use the whole scope of TUSTEP services in a modern, established programmers environment. The benefits are obvious: support of an open standard, widespread dissemination, programming in every other XML-editor, syntax highlighting, code completion and intelligible APIs. Moreover, TXSTEP is aided by the fact that there is no need to change the program's actual core. TUSTEP itself is open source as TXSTEP is going to be as well. The purpose of TXSTEP, as well as of TUSTEP, is not to provide ready-made solutions for pre-defined problems. It "only" provides program modules for the basic functions of text analysis and processing. It is the user who has to combine them in order to obtain the solution to a problem at hand. This is the prerequisite that he can take over the responsibility for every detail of the results obtained by computer application. One of the features of TXSTEP is it's capability to process almost all forms of textual data, whether this being XML-data or plain text files. Wherever there is textual data that has to be processed in the first place in order to gain TEI-data or to enhance the markup of insufficiently tagged XML data, TXSTEP is at it's place. The proposed demo is based on a prototype and shows the achieved state of our work in progress. It will demonstrate TXSTEPs functionality on the basis of tasks which can not easily be performed by existing XML tools, including problems presented recently on the TEI list.
Bibliography:
The project Virtual Scriptorium St. Matthias intends to reunite the worldwide scattered codices from the library of the Benedictine abbey St. Eucharius or St. Matthias in Trier electronically. The project is realized at Stadtbibliothek and Stadtarchiv Trier as well as at the Center for Digital Humanities at the University Trier since summer 2010. About 450 codices from the period between the eights and fifteenth century will be digitized in three years. These codices concern a wide range of topics from various traditions. Beyond theological and religious writings you find a large amount of latin classics like Cicero, Priscian, Sallust or Martianus Capella. A prestigious example for the inculturation of ancient and pagan spirit is an illustrated edition of Aesops fables. No other abbey possessed as many manuscripts of Hildegard of Bingen as St. Matthias. You find also three important specimina of Dectretum Gratiani. One of them includes 60% of all glosses ever written on this work. But the richly illustrated Trierer Apokalypse from carolingian times maybe the most famous of all these codices. The project Virtual Scriptorium St. Matthias will present an electronic catalogue that sums up the knowledge from older descriptions and combines them with a presentation of the digitized codices. In this context TEI is used as a standard of XML description of manuscripts. The amount of objects requires a synchronization of these descriptions with a dynamic database to correlate them with other digitized catalogues, editions and databases like the PND. The results will be integrated in Manuscripta Mediaevalia and TextGrid. In this way the project will not only provide images and metadata but will also be included into a virtual working space in where further research and exploring will be possible, e.g. with TEI concurrent transcriptions of selected works.The project homepage www.stmatthias.uni-trier.de will be released on the first of August 2011 on a trial base. The project should be presented in a short talk and a poster. The poster will cover the project thoroughly while the short talk is supposed to sketch the advantages and some practical limits of TEI in such an enterprise.
Bibliography:
The DFG-funded project Deutsches Textarchiv (DTA) started in 2007 and is located at the Berlin-Brandenburgische Akademie der Wissenschaften (BBAW). Its goal is to digitize a large cross-section of German texts from 1650 to 1900. The DTA presents almost exclusively the rst editions of the respective works. Currently there are more than 700 texts transcribed, most of them by non-native speakers using the double keying method. Even though our corpus of historic text exhibits very good quality, enough errors still occur in the transcription, in the markup, or even on the level of presentation. Due to the heterogeneity kind of the corpus, e. g. in terms of text sorts -- novels, prose, scienti c essays, linguistic reference works, cook books &c. -- there is a strong demand for a collaborative, easy to use quality assurance environment. Our poster and tool demonstration will provide an insight into the DTA QA workflow.
The TEI Consortium's Special Interest Group (SIG) on Libraries ( www.tei-c.org/Activities/SIG/Libraries/ ) has recently completed a major revision to Best Practices for TEI in Libraries ( purl.oclc.org/NET/teiinlibraries ). The revised Best Practices are stored in ODD files and contain updated versions of the widely adopted encoding "levels", which span from fully automated reformatting of print content to rich encoding to support content analysis and scholarly uses. They also contain a substantially revised section on the TEI header to support greater interoperability between text collections and MARC records. Schemas for each encoding level, derived from the appropriate ODD, provide a mechanism to better ensure conformance and interoperability of digital text. Principle contributors to this document will present a poster summarizing the encoding levels in the Best Practices, how the schemas work in relation to the TEI Guidelines, and what digitization workflows are envisioned for use with them.
The project "Berlin intellectuals 1800-1830" is a DFG-funded 5-year project based on unedited manuscripts (mainly, but not exclusively letters). It aims at gaining an in-depth insight into intellectual networks in Berlin at the beginning of the 19th century. The corpus we focus on allows to identify intellectual affinities and their evolution according to the scientific or political context of the moment: to that extent, history of ideas plays a central role in the project. Yet, its main focus lays in literary history. We aim at describing communication strategies and mechanisms as exactly as possible, working on the very borderline between private and public texts. Letters play a central role, as they are themselves literary objects that, especially in the context of Germany in the early 19th century, have an ambiguous position in the literary field. Many of them have been preserved in order to be published, some of them have indeed been partially published, often even partially re-written. By having access to the original manuscript, we are now being given the chance to re-write literary history: demonstrate the biases of earlier, faulty editions and give a larger public access to first-hand documents. The workflow in the project consists of defining relevant areas of research, searching for material, selecting the most interesting, digitizing as well as transcribing it and - then what?
Confronted with this question, we moved away from a traditional paper-edition and, in a second step, from a basic online-edition to develop a TEI-based concept that fulfills the varying demands of our text corpus. After dealing closely with the TEI Guidelines and discussing what data and metadata we want to encode, we designed our own XSL stylesheet as well as TEI-schema that concentrates on the requirements of encoding correspondence and has a letter-specific manuscript description. Using the Oxygen XML Editor, the TextGridLab and integrating databases proved to be convenient to our needs. Although now being experienced in encoding manuscripts and letters according to the TEI, we encounter some aspects that still remain problematic.
During the course of the project, not only project members have been involved but also a large number of graduate philology students. In the phase of development of the TEI-schema and conception of a frontend for the intended online-edition, several documents relevant to the project were offered as assignments to students in German and European Philology at the Humboldt University. The students reflected the hermeneutical dimension of the editorial process with a remarkable maturity, showing their ability to implement the theoretical discourse they have been made familiar with in their Humanities studies in the digital medium. This experience tends to suggest that even traditional curricula are nowadays in the position to include a digital part.
In this poster, we present the whole editorial process from the archival discovery to the publication of the material, from defining a TEI-schema that meets the specific and sometimes complex requirements of encoding correspondence to the final creation of the frontend. We also give a record of the pedagogical choices that were made to make the student’s input fruitful. The tensions of the discussions, between technicalities and hermeneutics, are being analyzed. We finally make some suggestions regarding the sustainability of digital humanities in European humanities curricula.
Bibliography:
William Godwin (1756-1836) wrote a diary that consists of 32 octavo
notebooks. The first entry is for 6 April 1788 and the final entry is
for 26 March 1836, shortly before he died. The diary is a resource of
immense importance to researchers of history, politics, literature, and
women’s studies. It maps the radical intellectual and political life of
the late eighteenth and early nineteenth centuries, as well as providing
extensive evidence on publishing relations, conversational coteries,
artistic circles and theatrical production over the same period. One can
also trace the developing relationships of one of the most important
families in British literature, Godwin’s own, which included his wife
Mary Wollstonecraft (1759-1797), their daughter Mary Shelley (1797-1851)
and his son-in-law Percy Bysshe Shelley (1792-1822). Many of the most
important figures in British cultural history feature in its pages,
including Anna Barbauld, Samuel Taylor Coleridge, Charles James Fox,
William Hazlitt, Thomas Holcroft, Elizabeth Inchbald, Charles and Mary
Lamb, Mary Robinson, Richard Brinsley Sheridan, William Wordsworth, and
many others.
The William Godwin's Diary website
<http://godwindiary.bodleian.ox.ac.uk/> presents richly marked up TEI P5
XML texts via Apache Cocoon and the eXist XML database with custom
Xqueries and XSLT. In addition to the transcribe texts and high
resolution zoomable images, the site includes many extracted data tables
based on the underlying markup. All the materials on the website,
including the underlying TEI P5 XML, are available under a Creative
Commons Attribution Non-Commercial license. This poster will present the
underlying architecture of the project, the compromises made and
difficulties encountered. The poster will be accompanied by a
demonstration of the website and show how the site works underneath for
those interested in those aspects.
Bibliography: