Text Encoding Initiative Workshops

ENS de Lyon, site Buisson

The Pre-conference Workshops will take place at the ENS (École normale supérieure) of Lyon in the building of the French Institute for Education (Institut français de l’Éducation – IFÉ). See: maps and directions.

To participate, all attendees will be required to bring their own laptops.

To register, please use this form.

Monday, October 26 – Day 1

VENUE	TIME / DETAILS
ENS de Lyon IFÉ building	9:00-12:30 – Workshops
Meeting room 2	TEI Simple HackAThon: Pushing the limits of Simple Processing Model – Part 1/2 Turska, Magdalena (University of Oxford, United Kingdom)
Meeting room 1	Explorative data analysis and visualisation in nodegoat – Part 1/2 Kessels, Geert – van Bree, Pim (LAB1100)
Meeting room 3	Technical Council Meeting – Part 1/3
Salon	12:30-13:30 Lunch break
ENS de Lyon IFÉ building	13:30-17:00 – Workshops
Meeting room 2	TEI Simple HackAThon: Pushing the limits of Simple Processing Model – Part 2/2
Meeting room 1	Explorative data analysis and visualisation in nodegoat – Part 2/2
Meeting room 3	Technical Council Meeting – Part 2/3

Tuesday, October 27 – Day 2

VENUE	TIME / DETAILS
ENS de Lyon IFÉ building	9:00-12:30 – Workshops
Meeting room 2	Introduction to EpiDoc: TEI for Ancient Documents – Part 1/2 Bodard, Gabriel (University of London) – Mylonas, Elli (Brown University, to be confirmed) – Stoyanova, Simona (Leipzig).
Meeting room 3	Introduction to the TXM Content Analysis Software – Part 1/2 Heiden, Serge – Lavrentiev, Alexei (ICAR Research Lab, Lyon University and CNRS, France)
Meeting room 1	Encoding correspondence meta data with correspDesc – Part 1/2 Stadler, Peter (University of Paderborn, Germany) – Dumont, Stefan (Berlin-Brandenburg Academy of Sciences and Humanities) – Seifert, Sabine (Humboldt-University Berlin) – Illetschko, Marcel (Austrian National Library)
Salons	12:30-13:30 Lunch break
ENS de Lyon IFÉ building	13:30-17:00 Workshops
Meeting room 3	Introduction to EpiDoc: TEI for Ancient Documents – Part 2/2
Meeting room 1	Introduction to the TXM Content Analysis Software – Part 2/2
Meeting room 2	Encoding correspondence meta data with correspDesc – Part 2/2

Wednesday, October 28 – Day 3

VENUE	TIME / DETAILS
ENS de Lyon IFÉ building	9:00-12:30 Workshops
Meeting room 1	SynopsX, a lightweight publication framework for XML Corpora Ingarao, Maud (Institut d’histoire de la pensée classique – UMR 5037) – Magué, Jean-Philippe (Interaction, Corpus, Apprentissages, Représentations – UMR 5191)
Meeting room 2	Extending the TEI support and new features in Oxygen 17.0 Jitianu, Alexandru – Bina, George (Syncro Soft, Romania)
Meeting room 3	Technical Council Meeting – Part 3/3

List of workshops (by title)

Encoding correspondence meta data with correspDesc (1 full day)

Workshop description

The Correspondence SIG proposes to organize a training workshop to convey the encoding of letters with the new correspDesc element. It will also present the derived CMI format[1] for interconnecting letter collections based on TEI and correspSearch[2]. The aim of this workshop is twofold: 1) dissemination of the new encoding possibilities and 2) get feedback that may help to improve the current Guidelines (with regard to correspDesc) and which may result in best practice models of letter encoding.

Details

With the latest release 2.8.0 of the TEI Guidelines, several new elements were introduced especially for the encoding of correspondence (letters, postcards, emails, etc.). The workshop will discuss those elements and how they can be applied to the participants’ own e. g. letter collections. The focus will thus be on the meta data of correspondence material, separating the communicative, the material and the textual aspects which are to be encoded within correspDesc, sourceDesc or profileDesc, respectively.

Moving on from here the Correspondence Metadata Interchange (CMI) Format will be presented, which is a constrained TEI customization provided by the Correspondence SIG for facilitating interchange of correspondence meta data.

The format builds on authority controlled IDs (e.g. VIAF and GeoNames) for identifying entities, and the W3C format for dates. This allows e. g., the web service “correspSearch” to aggregate index listings of various letter collections and to enable searching across those collections. The workshop will enable the participants to trans- form their correspondence meta data into the CMI format and will also introduce best practices for looking up authority-controlled IDs.

Participants will be asked to provide some sample correspondence materials which are to be discussed and encoded during the workshop. The workshop organizers will take care of minutes of those discussions which will be made publicly available. These minutes can hopefully help to improve the current TEI Guidelines (with regard to correspDesc) and to serve as a starting point for best practice models of letter encoding.

Duration

1 day

Maximum number of participants

Requirements

Projector, wireless internet access, participants will need to bring their own laptops

Workshop organizers

Stefan Dumont
Stefan is a research assistant at the TELOTA initiative of the Berlin-Brandenburg Academy of Sciences and Humanities. He is involved in several digital editions and developed the correspSearch web service.

Marcel Illetschko
Marcel works on the edition of the letters of the two late 19th/early 20th century scholars August Sauer and Bernhard Seuffert (Austrian National Library, main output: print edition) and is Co-Convener of the TEI Correspondence SIG since 2014.

Sabine Seifert
Sabine is part of the team editing “Letters and Texts. Berlin Intellectuals around 1800” at Humboldt-University and works at the Theodor-Fontane-Archive at Potsdam University. She has been Co-Convener of the TEI Correspondence SIG since 2014.

Peter Stadler
Peter is in charge of the digital edition of Carl Maria von Weber’s letters, diaries and writings. He is Co-Initiator and Co-Convenor of the TEI Correspondence SIG since 2008 and elected member of the TEI Technical Council for the term 2014 through 2015.

[1] https://github.com/TEI-Correspondence-SIG/correspDesc

[2] http://correspsearch.bbaw.de/

Explorative data analysis and visualisation in nodegoat (1 full day)

Brief outline

In this workshop we will support participants to employ explorative visualisations based on their own TEI data by means of the web-based research environment nodegoat (http://nodegoat.net/). A good example of how nodegoat can be used to create, manage, visualise, analyse and present structured data is the project on romantic nationalism by Joep Leerssen of the University of Amsterdam. The public interface of this collaborative research project can be consulted via http://romanticnationalism.net, or read more about it in the brochure: http://spinnet.humanities.uva.nl/images/2015-03/ernie_brochure_2015_lores_2.pdf.

Data that is presented by participants in this workshop will be loaded into nodegoat. Here, we will collectively produce interactive timelines, diachronic social graphs and diachronic geographical visualisations. Examples of existing datasets that could be worked with include: http://letters.mozartways.com/ and http://tei.ibi.hu-berlin.de/berliner-intellektuelle/, or data that includes the tags and that has been harvested by http://correspsearch.bbaw.de/. An example that we created in nodegoat based on data that is available through the correspSearch portal can be seen here: http://correspsearch-test.nodegoat.net/viewer.p/4/136/scenario/1/geo/fullscreen. Apart from letters, any other data set that has relational, geographical or temporal attributes can be used.

Workshop program

Introduce nodegoat (see: https://www.youtube.com/watch?v=eLDRNiJrRUc&list=PLXc6y7l7xxxIwd64QppyAA0 G2ECsNGJCx) , discuss the different functionalities of nodegoat and go into the opportunities of explorative data visualisations.
Discuss the prerequisites of the data and show how a TEI dataset can be loaded into nodegoat.
Discuss datasets that are introduced by the participants of the workshop, or datasets that have been identified in the <correspDesc>/correspSearch workshop.
Load one of these datasets into nodegoat and explore the possibilities of analysis and visualisation.
Provide participants with their own nodegoat account and support them to load their own data into nodegoat.

This workshop is useful for everyone who has TEI data that contains relational, geographical, and/or temporal attributes and is looking for means to explore this data. Once the data has been loaded into nodegoat, it can be used for in-depth exploration and analysis (centrality metrics, geographical patterns, diachronic developments) or used in dynamic and interactive visualisations that can be published online.

Requirements

Each participant should bring a laptop.

Workshop leaders

Pim van Bree, MA
Pim van Bree received his MA in New Media at the University of Amsterdam and his BA in Digital Imagineering at the NHTV University of Applied
Sciences in Breda. He graduated with a thesis on the actor network of transnational online dating, investigating the crossroads between the local, national, global, and the online assemblage. His work experience in the field of new media: digital strategist at Tribal DDB Amsterdam and software developer at KIWA.
Geert Kessels, MA
Geert Kessels received his BA in History from Radboud University Nijmegen and completed the research master program in History at the University of Amsterdam. He graduated with a thesis on the influences of German Idealism on the Slovak romantic intellectual Ľudovit Štúr. During his studies he completed an internship at the Study Platform on Interlocking Nationalisms and worked as a project manager for EUROCLIO – The European Association of History Educators.

Extending the TEI support and new features in Oxygen 17.0 (1 half day)

The main directions of this presentation are:

how to extend the TEI support in Oxygen;
the new features from Oxygen 17.0 and how they can help those working with TEI documents.

Below you will find a more detailed presentation of the topics.

New features in Oxygen 17.0

The new version of Oxygen introduces a great number of features that are worth discovering. We will talk about:

The XML Refactoring tool helps you change the structure of your XML documents. It offers a wide variety of operations, such as renaming, deleting, and moving elements and attributes. If the provided operations are not enough, you can create custom refactoring operations and share them with other team members. All of them are available through a friendly user inteface and can be applied across a set of files. This presentation is structured in(to) two parts.

The user perspective.
In this part we will use the predefined operations to solve a couple of use cases.
the developer perspective.
In this part we will present how to create a complex custom operation and how to share it with your team.

XML Quick Fixes

It is important to have documents without errors, but not all users know how to fix the errors from the documents. And even if they do know, ideally they should use a quick and automated solution. oXygen does not only report errors, it also helps you correct them automatically through the Quick Fix support, which provides automatic fixes for XML documents validated against XSD, Relax NG, and Schematron schemas. In the case of Schematron, the schema developer takes full control over the Quick Fix actions, being able to offer custom solutions to any detected issue using the Schematron Quick Fix language.

This presentation includes:

Generic Quick Fix support in oXygen
Quick fix support for XML validated against Relax NG
Schematron Quick Fix language
Developing custom quick fixes in Schematron
Extending the oXygen TEI Support

Oxygen already offers an advanced support for TEI but there are certain aspects and use cases that can be fully understood only by those highly involved with the language. For these situations, Oxygen offers a great number of extensions points which be used to enhance the already existing support. This presentation will introduce such use cases as well as the extension points that can help and how to use them. The use cases are: controlled values, how to offer proposals for attributes values from an external data source, like an external file or a data base query a plugin that will help with digital facsimiles, more specifically to mark zones on an image.

Introduction to EpiDoc: TEI for Ancient Documents (1 full day)

EpiDoc (epidoc.sourceforge.net) is a set of guidelines for encoding ancient source texts in TEI (originally
developed for Greek and Roman epigraphy, but now much more diverse), including a recommended schema and ODD, a lively community of practice, and an ecosystem of projects, tools and stylesheets for the interchange and exploitation of such texts. This tutorial will introduce participants to the principles and practices of EpiDoc encoding, which are largely based on the practice of encoding single-source documents and the ancient objects on which they are written, as well as some of the tools and other methods made available by the community for transforming, publishing, querying, exchanging, and linking encoded materials.

The proposed workshop will last for one day, divided into morning and afternoon sessions. The morning workshop will introduce users in a relatively gentle way (although assuming basic familiarity with XML and TEI) to the practice of EpiDoc, in particular how the community guidelines recommend (a) mapping Leiden Conventions for text transcription to TEI transcription and critical apparatus elements, and (b) mapping the descriptive and historical metadata features (from projects such as EAGLE and APIS) to TEI MS Desc elements. The afternoon session will
assume attendees with good familiarity with TEI, and introduce at a more advanced level some of the EpiDoc tools and practices, including the SoSOL and Perseids tags-free editors and workflow management interfaces, the EAGLE string-to-xml transformation to EpiDoc, EAGLE Wikibase and WikiMedia Commons workflows, etc. Users interested in a basic introduction to EpiDoc may attend only the morning session; users with a thorough background in TEI wanting to learn about the EpiDoc toolsets and practices in particular may attend only the afternoon session; but it is expected that most applicants will want to attend the whole day workshop.

Participants are strongly encouraged to bring their own laptops.

Brief bios

Introduction to the TXM Content Analysis Software (1 full day)

Objective

The objective of the “Introduction To The TXM Content Analysis Software” tutorial is to introduce the participants to the methodology of textometric content analysis (http://textometrie.ens-lyon.fr/?lang=en) applied to corpora of TEI encoded texts through the use of the free and open-source TXM software (http://sourceforge.net/projects/txm/files/documentation/TXM%20Leaftlet%20EN.pdf/download) directly on their own laptop computers.

At the end of the tutorial, the participants will be able to load their own textual corpora (Unicode encoded raw texts or XML-TEI encoded texts) into TXM and to analyze them with the panel of content analysis tools available: word patterns frequency lists, kwic concordances and text browsing, rich full text search engine syntax (allowing to express various sequences of word forms, part of speech and lemma combinations constrained by XML structures), statistically specific sub-corpus vocabulary analysis, statistical collocation analysis, etc.). During the tutorial, each participant will use TXM (from http://sourceforge.net/projects/txm) and the TreeTagger lemmatizer (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger) on her Windows, Mac or Linux laptop and will leave the tutorial with a ready to use environment. The tutorial will also introduce the participants to the TXM community ecosystem (users mailing list and wiki, bug reports, etc.) and to the TXM portal server software version (see for example http://portal.textometrie.org/demo) for online corpus distribution and analysis.

Target group

Any TEI conference participant interested in text analysis.

Requirements

Participants must prepare their laptop before coming to the workshop according to the following procedure:

(1) Install TXM

(1.1) download it for your platform at http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7
(1.2) launch the installer and follow instructions [if you read French, detailed instructions can be found in TXM manual:
http://sourceforge.net/projects/txm/files/documentation/Manuel%20de%20TXM%200.7%20FR.pdf/download]

(2) Install TreeTagger software and the English parameter file and configure access to them in TXM. For this follow the instructions at
http://txm.sourceforge.net/installtreetagger_en.html (TreeTagger installation for TXM is not a full installation of TreeTagger]

Tutorial leaders

Serge Heiden, ICAR Research Lab – Lyon University and CNRS, France. S. Heiden develops the textometry textual corpora analysis methodology (http://textometrie.ens-lyon.fr/?lang=en) and leads the development of the TXM software. He regularly teaches TXM tutorials in France and abroad (European seminars or summer schools, international conferences, etc.)
Alexei Lavrentiev, ICAR Research Lab – Lyon University and CNRS, France A. Lavrentiev develops the Base de Français Médiéval (BFM) Old French corpus in TEI (http://bfm.ens-lyon.fr/rubrique.php3?id_rubrique=123). He also participates in the development of the TXM software which is used to give access to the BFM corpus online. He regularly teaches TEI tutorials in France and abroad (European seminars or summer schools, international conferences, etc.)

SynopsX, a lightweight publication framework for XML Corpora (1 half day)

Outline

Initiated by the Digital humanities workshop of ENS Lyon (Atelier des Humanités Numériques), SynopsX is a lightweight framework which aimed is to easily publish and expose XML corpora. It’s a full XQuery web application built with the native XML database BaseX. The sources of the project are published under GNU on GitHub (https://github.com/ahn-ens-lyon/synopsx).

This workshop aims at exposing the benefits of XML databases, and at proposing participants to start using SynopsX to publish their own corpora.

Duration

One half day.

Requirements

The organisers will need a projector and participants are expected to bring their own laptop

Appeal to the TEI Community

Various approaches have been proposed to publish TEI corpora on line, but no standard software solution has emerged yet. As a possible next step after the markup of an edition, publication is still a difficult issue for many projects in Digital humanities.

We target digital humanists seeking for solutions to publish their TEI corpora. A very basic knowledge of XQuery and XPath will be helpfull.

Workshop Leaders

This workshop is organized by the Digital humanities workshop of ENS Lyon.
Leaders will be :

Jean-Philippe Magué
Associate professor, Digital humanties, Chair of the Digital humanities workshop, ENS Lyon

Maud Ingarao
Research assistant, ENS Lyon

TEI Simple HackAThon: Pushing the limits of Simple Processing Model (1 full day)

Presentation

The Guidelines of the Text Encoding Initiative Consortium (TEI) have been used throughout numerous disciplines of the field of Digital Humanities to mark up digital texts for many years. Doing so has produced huge numbers of TEI collections underlying leading digital editions, projects, and other resources. These digital texts are most often transformed for display as websites, camera-ready copy, or for import into other systems for processing, analysis, or visualization. While the TEI Consortium provides XSLT stylesheets for transformation to and from many formats, and both commercial and open source software is TEI-aware, there is little standardisation and no prescriptive approach towards processing TEI documents. TEI Simple project aims to close that gap with its Processing Model providing the default rules of processing TEI into various publication formats while offering the possibility of building customized processing models within TEI Simple infrastructure. Where editors and developers unfamiliar with the TEI often approach the development of TEI processing systems with trepidation and quite often ignorance of potential complications – TEI Simple provides a layer of abstraction to separate high-level editorial decisions about processing from final rendition choices and low-level output-format specific intricacies. This unconference-style HackAThon is open either to developers with little or no TEI/TEI Simple experience or editors, curators and archivists dealing with the TEI documents, or people who have both. The aim is to test the limits of TEI Simple processing model and try to enhance its functionality implementing solutions to identified deficiencies. It is hoped that the developers and editors with help of TEI Simple experts will be able to share their expertise in a knowledge exchange hacking event. It is
intended to be both fun and fruitful.

Participants and Organisation

The HackAThon will be organised as an unconference-style event. If notification of workshop acceptance is given then a call for participation will be circulated for people to submit notification that they wish to participate. Applicants will be asked for basic details of their experience and possible contribution to the HackAThon. Decisions will be made by an international programme committee based on the criteria of getting both a sufficient and variety of expertise (technical and editorial), interest in challenges similar to those proposed, as well as geographical, cultural, disciplinary and gender balance. Participants will be notified by 15 September 2015. Late applications will, of course, be considered swiftly and if acceptable included if space remains. After 15 September 2015 a mailing list for HackAThon participants will be created to discuss in advance the possible challenges that might be undertaken. This will allow some pre-seeding of ideas, preparation of material and potentially people to start work or familiarise themselves with particular aspects of the TEI Simple framework.

Possible challenges that this HackAThon will focus include:

Preparation of camera-ready editions of complex historical early-printed sources
A server platform which provides access to TEI Simple processing engine through a RESTful interface
Integration of TEI Simple with existing popular software (e.g. editors, processing infrastructures, etc.)

These are intended as a set of potential interlinking challenges suitable for a range of skill levels and familiarity with the TEI. The potential challenges and possible implementations of them will be openly discussed on the HackAThon’s mailing list. All outputs will be made freely available under open licenses.

HackAThon Outline

The HackAThon will be organised with an unconference style pre-conference mailing list discussion, followed by finalising on the day the precise groups of participants and challenges they want to work on. Most of the rest of the day is spent working in these groups, with a review mid-way through of what the groups are working on. The day concludes with reporting back, demonstrating the work they have done, and discussing next steps (potentially with TEI-C support).

Morning:
09:30-10:30 — Introduction and coffee, finalise groups and challenges.
10:30-12:30 — Groups start work (break out sessions)
12:30-13:30 — Lunch
13:30-14:00 — Groups briefly report on work done so far
14:00-16:00 — Group work
continues (break out sessions)
16:00-17:00 — Regroup, report back and show work, plan for next steps

Organisers and TEI Simple Experts

The organisers of this proposal all have extensive experience in leading and coordinating hands on workshops focused on TEI or related theories and technologies. They will be available
during the workshop together with other TEI and DH experts who will be attending TEI Conference 2015 and confirmed their interest to take part in the HackAThon should it be accepted by the
programme committee. These TEI experts (and developers in their own right) include:

Magdalena Turska (magdalena.turska@it.ox.ac.uk) is Researcher for Digital Scholarly Editions in Academic IT at University of Oxford’s IT Services, TEI Simple project member and DiXiT project’s ER fellow. Magdalena is the main organiser and liaison of the TEI Simple HackAThon.
Sebastian Rahtz (sebastian.rahtz@it.ox.ac.uk)
is Chief Data Architect in Academic IT at University of Oxford’s IT Services, TEI Simple project PI and
Brian Pytlik-Zillig (bzillig1@unl.edu) is Professor
and Digital Initiatives Librarian at the Center for Digital Research in the Humanities and TEI Simple project PI.
James Cummings (james.cummings@it.ox.ac.uk)
is Senior Digital Research Specialist in Academic IT at University of Oxford’s IT Services. He is an elected member of the TEI Technical Council.

Technical Council Meeting

Meeting of the Technical Council of the TEI Consortium. For more Info, see TEI wiki.

To register, please use this form.

Go to schedule

To participate, all attendees will be required to bring their own laptops.

Monday, October 26 – Day 1

Tuesday, October 27 – Day 2

Wednesday, October 28 – Day 3

List of workshops (by title)

Workshop description

Details

Duration

Maximum number of participants

Requirements

Workshop organizers

Brief outline

Workshop program

Requirements

Workshop leaders

New features in Oxygen 17.0

XML Quick Fixes

Brief bios

Objective

Target group

Requirements

Outline

Duration

Requirements

Appeal to the TEI Community

Workshop Leaders

Presentation

Participants and Organisation

HackAThon Outline

Organisers and TEI Simple Experts

Important dates & infos

Sponsored by