Palaeography Today. Old questions and new technologies

Eef Overgaauw, Berlin

Although Palaeography has made substantial progress during the 20th century, many of the major questions palaeographers used to ask themselves and their colleagues from the 19th century onwards have remained without lasting answers. Concurrently, young scholars from various professions discover medieval manuscripts as objects for academic research. New technologies, such as the use of statistical methods, the application of digital techniques for making reproductions and the use of databases will certainly contribute to find new answers to old questions. The introduction of statistical methods in codicological research has already initiated a new view on manuscripts from the 1980s onwards. As Palaeography is based on scholarship as well as on connoisseurship, answers to palaeographical questions are often provisional. This conclusion does not furnish evidence for the weakness of Palaeography as a science; it rather shows its vitality.

Digital Resource for Palaeography, Manuscripts and Diplomatic

Stewart Brookes, London

This paper outlines the key objectives of the Digital Resource for Palaeography (DigiPal), a new project that brings digital technology to bear on scholarly discussion. Taking advantage of recent advancements in digital research, as well as developing new technologies, DigiPal will offer innovative ways of interrogating and interacting with manuscripts and data. It is our intention that DigiPal will showcase the benefits of digitally-assisted palaeography, opening up new possibilities for the study of scripts, scribes, and manuscripts.

One of the main outcomes of DigiPal will be a web-based, extensible, framework for the study of script in its manuscript or diplomatic context. In order to develop this framework, a substantial test-case will be produced which takes as its focus the English vernacular minuscule of the eleventh century. The project will catalogue, describe and, where possible, source digital images of about 1200 scribal hands. In addition, it will integrate material from other projects and produce a substantial body of new material. The intention is to provide a resource which provides material for a complete corpus, rather than selected highlights; which allows one to search and manipulate this content in new ways; which provides evidence to be used in palaeographical argument; and, crucially, which is designed first and foremost to address the research questions and needs of palaeographers.

Additional material:

New Methodologies for Effective Exploitation of Digital Manuscript Corpora

Wendy Scase, Birmingham

This presentation identifies and explores how palaeographers might best exploit digital manuscript resources.

Today some hundreds of thousands of digital images of medieval manuscripts are available and there is a large quantity of metadata of various kinds, calendars, catalogues, transcriptions, and descriptions being among the most important. And this is only the beginning. In future we can expect vast quantities of manuscript information to be available in digital form. This revolution in access is parallel in scale and importance to the great editing projects of the nineteenth century that first brought medieval primary sources into print.

Only a minority of existing images and metadata were created specifically to support palaeography teaching and research. But, I argue, other digitised materials can be of immense value to palaeography if we can address two challenges. The first challenge is to enable researchers and students to discover, federate, and use interoperably datasets that were produced according to different standards for different purposes and are in different locations, in other words, to produce new digital manuscript corpora for new purposes. The second challenge is to develop research methodologies that can address palaeographical problems using this new ‘big data’.

The presentation draws on the examples of the Manuscripts of the West Midlands online manuscript catalogue (http://www.hrionline.ac.uk/mwm/) and the Vernon Manuscript Digital Edition (http://www.birmingham.ac.uk/schools/edacs/departments/english/research/projects/vernon-manuscript.aspx). These are not primarily palaeographical projects. But both projects capture a great deal of palaeographical information (the Vernon manuscript is the largest surviving Middle English manuscript – a medieval ‘data deluge’ in itself). The files can also be used to address palaeographical problems, and are particularly valuable when different kinds of data are correlated across a large corpus. For example, correlation of scribal graphemic profiles with scribal graphetic profiles using the transcription, description, and image files of the Vernon manuscript edition yields new insights into the provenance of the manuscript’s scribes and decorators. Further work on this and enlarged corpora will examine aspects of scribal behaviours and practices.

Additonal material:

Parchment and Scribes in the Malatestian Scriptorium

Paola Errani, Cesean

The library built in the mid-15th century by the Lord of Cesena Malatesta Novello still preserves 343 manuscripts. 128 out of these codices were copied and illuminated for him. The activity of the Malatestian scriptorium covers a period of about twenty years, from 1446 to 1465, when Novello died. This corpus represents an excellent field for palaeographical and codicological researches. Our project aimed at identifying one of the features of the parchment used in the scriptorium, by measuring the thickness of the leaves composing the quires of each manuscript. The collected data were subjected to a basic statistical treatment and compared with  unfortunately rare similar data drawn from more ancient or contemporary manuscripts. According to our results the thickness of Malatestian parchment doesn’t vary in a significant way during the Malatestian era and is on the average of Western medieval manuscripts (between 150 and 200 μm, not depending on the date). There is a very small difference among the folios used by the eleven identified scribes, while the parchment of four manuscripts written by three of these copyists for other commissions proved to be thinner and more homogeneous than that they used for Malatesta. So on the one hand the quality of parchment used for Malatesta differs from that used for other commissions; on the other hand there is no significant difference in the Malatestian scriptorium due to time or hands. This suggests that the sources of supply and methods of treatment of parchment were unique or at least standard and relatively invariant throughout the life of the Malatestian scriptorium.

Additional material:

DNA Analysis and the Study of Medieval Parchment Books

Timothy Stinson, Raleigh

This presentation summarizes and follows up on several preliminary tests that have shown that DNA survives in medieval parchment manuscript leaves and may be extracted and analyzed, and offers suggestions for defining and implementing future genetic studies of parchment. Potential applications of the genetic analysis of parchment, including not only codicological studies, but also the mapping of trade routes and the study of medieval animals and animal husbandry, are discussed, as well as parchment's nonpareil value as archaeological evidence. I also articulate the need to consider genetic data in conjunction with other types of evidence – such as historical texts and archaeological data – both in planning tests of parchment and in interpreting the results of such tests.

Additional material:

OCR for manuscripts and early prints

Torsten Schaßan, Wolfenbüttel

The presentation will refer to experiences with OCR technologies made in German libraries. The most important pieces of software in use will be compared. Although experiences were made mainly with printed materials it will be examined where or how these can be transferred to handwritten materials. Additionally, the Wolfenbüttel local practice and experiences with the use and processing of OCR results will be presented. It will be examined what factors are crucial for OCR quality and discussed, how error ratios can be measured. As an agreement on what an error is at all does not exist, this topic will finally be addressed.

Additional material:

Identifying Join Candidates in the Cairo Geniza

Lior Wolf, Tel Aviv

A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. The task of finding joins is currently conducted manually by experts, and presumably only a small fraction of the existing joins have been discovered. In this work, we study the problem of automatically finding candidate joins, so as to streamline the task. The proposed method is based on a combination of local descriptors and learning techniques. To evaluate the performance of various join-finding methods, without relying on the availability of human experts, we construct a benchmark dataset that is modeled on the Labeled Faces in the Wild benchmark for face recognition. Using this benchmark, we evaluate several alternative image representations and learning techniques. In addition, a set of newly-discovered join-candidates have been identified using our method and validated by a human expert. Finally, we discuss the benefits of employing additional paleographic data in order to obtain better performance.

Additional Material:

A few art historical reflections

Nataša Golob, Ljubljana

The www offers a nearly unending list of digitized data bases and projects, relevant for a researcher of mediaeval manuscripts. Some of them preserved the structure of classical catalogue, many portals are changing quickly. A great problem for users arises when a complex and expensive equipment is necessary, and therefore many researchers are put in an unprivileged situation. – At the moment, art historian has a possibility to consult various digitized collections of illuminated manuscripts, and when live zooming is given, one can profit from details, giving information on iconography, technology and style, school, author. Images are usually available in colours and thus conveying the final aesthetic result, for a researcher is also a black-and-white image important, because it is telling more about kinetics of the hand of a scribe, writer or illuminator (Ljubljana, FS 6772). Live zooming offers also to an average researcher to see layers of paint and ground (Maribor R 115), as well details, relevant for distinction between the hand of a master and hand of an aid (Maribor, Ms 12). Digital technology is reliable tool to differentiate between “absolutely identical elements” in lettering, initials etc. (Novo mesto, A-a 2, initial E) and clarifies technical details (Novo mesto, A-a 2, pricked outline for a dragon). Essential in processes of preservation, conservation and non-destructive methods is possibility to create a virtual reconstruction of original colouristic result (f. i. with faded colours or blackened silver paint, Ljubljana, NUK 2) or to use several techniques in “virtual restoration” and thus preserving the original remains as it is (Ljubljana, ARS, blueprint of F. Munda palace).

Spatial Exploration Tools in the Graphem Project

Matthieu Exbrayat

The Graphem Project is a three and a half year project funded by the French National Research Agency, which ended in June 2011. The aims of Graphem consisted in the study of various pattern recognition techniques applied to digital paleography. A large part of the project focused on the automated study of writings styles, trying to extract digital features that would help to discriminate styles, and potentially to reorganize those as objectively as possible, without the bias induced by an expert human observation. In this talk we propose an overview of the techniques proposed by the three computer science teams involved in Graphem, the Lipade (Univ. Paris Descartes), the Liris (INSA Lyon) and the Lifo (Univ. Orléans). These techniques cover both feature extraction and visual exploration of their results.

Additional material:

Interpreting Ancient Documents: Of Avatars, Uncertainty, and Knowledge Creation

Ségolène Tarte, Oxford

This talk presents shortly the cognitive processes papyrologists tap into when deciphering, transcribing and interpreting ancient documentary artefacts.  Based on those observations, I further show how the ontology of the digitized versions of a text-bearing artefact deviates from the traditional mimetic model. Digitized artefacts share three ontological characteristics with Mesopotamian salmus (images, representations): they are encoded, embedded into the real and they influence the real; they are avatars of the artefact, expressing a specific form of presence of the artefact conditioned by the act of digitization. I then move on to explore how this new ontology influences scholarly practice: the impact it has on the solving and introduction of uncertainties, and on the act of knowledge creation itself. I conclude by stating that not only the provenance of the data should be documented but also the process of interpretation itself, at all its stages.

Additional material

Investigation of Historic Documents with Focus on Automatic Layout and Character Analysis

Melanie Gau and Robert Sablatnig, Vienna

Digital imaging for ancient documents has gained significant interest in recent years. It opens new possibilities in preserving, analyzing and presenting the content of cultural heritage. Using multispectral imaging techniques in combination with digital image processing allows, on the one hand, enhancing the readability of palimpsests and disappeared or damaged text due to environmental effects like mold, humidity or fading of ink; and, on the other, the automated investigation of the structure and content of manuscripts. This paper reports two interdisciplinary projects of philologists and computer scientists devoted to the recording, investigation and editing of three medieval Slavonic manuscripts of extraordinary importance. First of all, the projects deal with the development of techniques for the recording, registration and combination of multivariate image data to increase the readability of the written text. The results of the enhanced images are used for subsequent computer aided procedures, e.g. the segmentation of the ruling (line structure), computer aided script description and stroke analysis as well as the deciphering and reconstruction of the script. The algorithms developed aim at performing these tasks more precisely and faster for philologists.

Additional Material: