TEI Council Meeting, 7-9 November 2011

Present in person: Laurent Romary (Chair, LR), Piotr Banski (PB), Brett Barney (BB), Lou Burnard (LB), James Cummings (JC), Kevin Hawkins (KH), Martin Holmes (MH), Elena Pierazzo (EP)

Guest (day 2 only): Brian L. Pytlik Zillig (BZ)

Present remotely: Gabriel Bodard (GB), Sebastian Rahtz (SR), Stuart Yeates (SY)

Location: INRIA, Paris, France

Day 1: November 7 2011

1.1 Morning

1.1.1 Green Tickets

Review of green feature requests and bugs previously discussed by email.

1.1.1.1 Feature Requests

certainty/ precision/ respons from model.glossLike. Straightforward ticket: LB will do it and close ticket.

gap reason="cancelled". So "deleted" needs to be added to the list as well. GB will be asked to implement the proposed changes.

textLang usable in bibliographies. Proposal now is to move

textLang to core, and move all the corresponding discussion of the element. LB objects a bit because core is too big, but that's a different issue. JC will actually implement it.

type-like attributes to provide a pointer to an existing ontology or taxonomy. We have a current practice in which syntactic sugar type values are taken from the names of the corresponding elements; we should express this clearly under type's explanation. There should also be a comment on the general definition of type to the effect that we recommend the use of existing taxonomies where suitable. We should also ensure that all our examples in the Guidelines are consistent with this practice, so one example in 3.5.1 needs to be changed ("organization" becomes "org"). However, rs defines its own type, so the same text needs to be added there. The reference to the BBN taxonomy belongs in rs/ type, not the general

type. KH will implement this.

f to contain PCDATA. LR will implement this, getting examples from documents created by the group requesting the change.

person can be sent out to TEI-L. LB reminds us about privacy concerns.

1.1.1.2 Bugs

label used to mark topic transition in prose text.

xml:lang values. EP and PB: We must use two-letter codes where they exist, and where they don't, three-letter codes can be used; locations should be in upper-case. PB will recheck Syd Bauman's list (on the ticket) and fix all bad examples.

head and p within figure. Solution: remove "Figure 1: " from the second example, to reduce confusion. LB has implemented and closed the ticket.

1.1.2 General Discussion

LR commented that he would like to see many people in the community implement tickets in future, based on OK from Council.

Who should assign colours to tickets? At the moment it's LB who assigns colours, but others could/should. The submitter shouldn't, but the first commenter perfectly well could. For clarification:

1.2 Afternoon

1.2.1 Amber Tickets

description alongside traits/ states. LR, PB: The proposal for a new description element is rejected. However, there is still an issue to deal with. trait and state are essentially the same, the only difference being ontological (essential vs incidental). The content model is the same, and they are both timed. In the long run, we would want to keep

trait only, and when you need to differentiate essential and incidental, you rely on the type attribute, and the possible temporal attributes you may use to limit applicability. Should we deprecate state in favour of

trait; or use the Guidelines text to explain that if you're confused about the difference, use trait, and that state is syntactic sugar for trait type="state"? We recommend that trait is the default element to represent semantics of properties etc.; this will require an adjustment to the definition of trait to make it broader, so that it can be used for temporally-bounded characteristics. After some discussion, we decided that

state makes a more obvious default, because it can handle both temporal and non-temporal properties. So: in immediate response, we will suggest

state as the default choice, purify the trait examples in the Guidelines to remove the temporal attributes, check all the trait examples to see which need to be changed to state, and raise a new ticket to consider the merging of model.traitLike into model.stateLike.

listPerson and listOrg members of model.personLike. We should add listPerson, listPlace and listOrg to

org; and add listOrg to particDesc. LB suggested creating a new model.orgPart, which would make the changes to org simpler, and adding listOrg to the content model of particDesc. LB will implement it.

role for publisher. To deal with this ticket, we should allow respStmt within imprint. We would then have ways to handle more than just publisher-like roles within imprint. This requires broadening the definition of resp and respStmt to allow them to apply to organizations. We should also consult with the workgroup who looked at physical bibliography for more guidance on requirements in this area. LB objected that there was little distinction in the past between the functions we now distinguish (printer, bookseller etc.); EP objected that this is certainly not the case for early printed books, and gave examples. The Council decided that KH & EP's solution should be implemented, and MH will implement it.

start. Point 1: should be rejected because we can think of good use-cases where both may be required; Point 3: We can't really make much sense of the comment, except that we do need more examples of the use of points. MH will look for (or create) more examples of start and

points.

availability. Assigned to MH to finish implementation.

datingMethod did not exist, so these two attributes may be confused; our response would be that now datingMethod will be available, there is no need to abuse calendar, and existing abuses can be fixed, or not, at the discretion of their perpetrators. Assigned to GB to implement.

person, place and org, and which would have a content model of model.stateLike (assuming that's the one we recommend in the other ticket). Assigned to PB for implementation.

subst (model.pPart.transcriptional); create a new content model that allows only add, del and milestone elements (starting with adding the milestones, removing model.pPart.transcriptional and replacing it with the individual elements, and then removing the others at the expiry of the deprecation period); and create substJoin to allow the grouping of a range of elements. The Council gradually came round to the view that we should actually remove model.pPart.transcriptional elements now, and avoid the deprecation period, because the original inclusion of those elements was wrong, and therefore this is a bug and should be fixed. GB suggests we should actually use this as an exercise to work out how we should do deprecation. In the end we came down in favour of implementing it right now. GB will implement it.

Day 2: November 8 2011

2.1 Morning

2.1.1 Genetic Transcription

The transition between the discussion of facsimile and the explanation of

sourceDoc, and when you would use the latter, needs to be expanded (EP, MH and LB). The content models of surface, zone etc. are identical whether they appear within facsimile or sourceDoc. This could lead to confusion. EP: The idea is to discourage facsimile. MH: But the genetic workgroup believes that facsimile and sourceDoc are different; if we remove the former, those who need that will surely complain. JC: facsimile and sourceDoc are different, and have different use cases. LB: We should provide schematron rules to say that if you use e.g. line, it should have ancestor::sourceDoc. LR: three points: We have ended up with a double mechanism, when a single mechanism would be simpler; some people like facsimile and want to keep it; some people would pursue the goal of coherence and want to remove one of them. MH: can we make facsimile an actual alias for sourceDoc through a technical mechanism? LR: Yes, through equiv in the description of

facsimile, in the ODD. We could provide two ODDs, one for

sourceDoc and one for facsimile, and the latter could be defined as a subset of the former. This is not yet possible in ODD. MH, summarizing: For now, we're stuck with using both elements, explaining the difference between them, and explaining the long-term strategy.

EP: There are some major problems with the chapter as it stands:

surface is a physical object, or exists prior to interpretation, but we need to modify that description so that it allows for, or even specifies, the fact that a surface is a psychological construct of the encoder, presumably based on the assumed perception of the original author who chose to use it that way. JC: Why shouldn't this situation be handled by nesting surface? EP: In this case, you could represent the page surfaces side-by-side, rather than nesting them. There is no need for nesting.

patch. If the patch is written on both sides, then it's

flippable, but how do we distinguish that from two separate patches? JC, LR: We need nested surfaces to handle this. LB: patch is just a special kind of surface. Should we collapse it into the definition of

surface? LB suggests we kill patch, and we introduce something called surfaceGrp, of which the prototypical case is a "leaf"; another example is a pile of post-it notes on top of each other. A surfaceGrp is not a matter of interpretation; it's a physical object which includes multiple

surfaces. EP objects to the idea of physical objects, because all determination of surface is by the encoder; we can't really express physical reality. MH, JC: A surfaceGrp could not have a single coordinate space because it is not a single two-dimensional space; a surfaceGrp can have a location within the coordinate space of its parent surface, and its child surfaces have their own coordinate spaces. The group consensus now is that what we were thinking of as a patch could be represented either as a nested

surface (in the case of a single pasted object with only one significant face) or a surfaceGrp (in the case of a flippable two-sided object, or a multi-page booklet pasted onto the parent surface). Both surface and

surfaceGrp therefore need to be able to express their dimensions within the coordinate space of their parent surface; in the case of a nested

surface, it may also define its own coordinate space which act as the context for zone elements defined within it. surfaceGrp may be a child of facsimile or sourceDoc.

SUMMARY: LB will rework the draft, and do it as fast as possible; he'll then ask all Council members to read the whole thing again. We should allow three working days, including if possible a weekend, to do the appropriate level of proofing.

2.1.2 Amber Tickets

Three amber FRs were assigned to LB and set to Pending because they relate to the genetic proposal already in process:

document patch line elements for genetic view.

Two tickets were postponed till the afternoon EEBO session. The following tickets were addressed:

witStart et al. to model.milestoneLike. This topic is being addressed by a working group right now, so the ticket is assigned to EP with a note to that effect.

altIdentifier in msPart. Assigned to EP, who will nudge Torsten to provide the required examples.

The following tickets were dealt with:

colloc. Marked as green, accepted and pending, and assigned to LR.

egXML content (validation and presentation). JC, PB: There are two separate issues. First, does the prose need to be tidied up with respect to the last two comments. We agree, and this should be done. Second, should the

teix:show attribute be handled. LB commented on the ticket to say that

tei:rend could be used both for showing and for specialized rendering to e.g. highlight part of some code for teaching purposes. Therefore JC and PB propose using the tei:@rend attribute inside teix:egXML to specify rendering requirements etc. So: a) correct the prose according to the last two comments on the ticket, and 2) add tei:@rend to all elements within the teix namespace. The implication of tei:@rend in this context is that a processor will act on the attribute as a rendering instruction, rather than show it as part of the example. Set to green and assigned to JC.

spGrp. EP, LB: The proposal is that there are sub-div-level structural groups of speeches etc., as in e.g. shared arias or musical numbers, or play-within-a-play situations, and that a new spGrp element should be created to handle this. The content model would require a minimum of two

sps, along with anything else that can appear between sps. But is this too specific? Should we instead introduce a new floatingDiv element that would handle other cases too? However, this is a simple case with a simple proposed solution which everyone understands, so we accept it. Marked as accepted and green, and assigned to LB.

idno, not for ident. For instance, he uses URLs and filenames, and both of these should be idno. However, idno also does not permit internal structure, and there are other use-cases where that would be a good idea (for instance ISBNs and ISSNs). At the same time, ident does have a specific purpose, which is tagging formal identifiers in e.g. programming languages, and these do not typically have internal structure. Finally, we are sympathetic to Sebastian's objections related to processing. Therefore we propose that idno should be made recursive, allowing internal structure, examples of recursive idno should be supplied, and LB should be encouraged to use

idno instead of ident for his purposes. EP strongly objected that the subdivisions of idno are not necessarily idnos. LB says that the difference between ident and idno definitely needs some clarification. Assigned to MH to clarify the guidelines on the difference between

idno and ident, and close the ticket, and to LR to raise a new ticket for the nesting of idno so the council can address it at length.

graphic available within table and formula. BB, KH: The current content model of formula allows text or graphics. It turns out that table has similar requirements: it may (need to) be represented by a graphic element. So the proposal is to extend the model of table to include

graphic. The prose should be revised not only for table but also for

formula to explain that this is allowed. LB pointed out that you could use facs, but MH replied that you should be able to choose between doing things in the same way as is done with formula if you wish, or use

facs throughout if that is your encoding practice. MH also pointed out that there is a use-case for more than one graphic (tables printed over several pages), so it should be one or more graphics. Set to pending, accepted and green. Assigned to KH (with help from LB for the content model change) to implement.

floatingText specifies "complete". The Council had discussed and agreed on the second ticket in Chicago, but implementation was held up by another ticket, which has now been resolved, so we can go ahead with minor adjustments to the proposed wording to remove the example of a musical number, which we now recommend handling through spGrp (see above). Council created a new, improved formulation of the change to the text. Set to agreed, green and pending, and assigned to KH for implementation, although he will assign it to BB when SourceForge settings have been changed to permit this.

2.2 Afternoon: ECCO/EEBO Discussion

Brian L. Pytlik Zillig (BZ) from EEBO joined the meeting all day, and in the afternoon Council worked with him on a number of issues regarding the possible convergence of TCP and TEI. The EEBO corpus will contain several billion words, and will be freely available in the future, so it's in our interests to make sure that interoperability between TCP and TEI is maximized.

There are a number of areas in which moving from TCP to TEI P5 is complicated. Three groups of issues were identified by Martin Mueller (MM):

2.2.1 First Group of Issues

figure is likely to benefit many other projects too (BB). SR proposes that we change the content model of figure to match that of the prototype of floatingDiv. LB suggests that figure should have the same content model as div. The Council agrees that this should be enacted and JC has created ticket 3434973 for it.

elementSpec) and overly restrictive (no opener or

closer). SR proposed a more sophisticated content model which has been added to a comment on the ticket. SR volunteers to implement this.

ls; still others thought that no line-group could possibly constitute a

head. Council agrees that both lg and l should be allowed inside head, because elsewhere both lg and l are allowed alongside each other. How to implement this is not yet clear. LB has raised ticket 3434992 for this, and it is not yet assigned to anyone.

sp (which would also align sp with said, which does allow those elements). Council agreed that, since a letter can be read out by a speaker, and a letter is a floatingText, then sp should allow floatingText; the case for table is harder to make, especially since it can be avoided simply by wrapping the table in sp. SR could not find any examples of table in sp in ECCO, but found one of a

sp/ p/ table, which looks quite convincing, and Council came to the view that table should be allowed. The Council's decision is that we want to allow both floatingText and table within sp, and a ticket should be raised and someone assigned to discover the best approach to this, since it's not immediately obvious. This has not yet been assigned to anyone.

2.2.2 Second Group of Issues (EEBO's responsibility)

Council looked at the following five items, and concluded that all seem reasonable, and are the responsibility of whoever is tranforming the EEBO texts:

notes with type attributes.

2.2.3 Third Group of Issues (unaddressed)

There was no time to consider the third type of issue identified by MM, exemplified in the following items:

Council noted that it would also like to request access to the current state of EEBO in its "lossless XML" form, as well as the facsimiles, so that we can generate a list of issues that we believe EEBO might wish to address, and made that request to BZ.

Day 3: November 9 2011

3.1 Morning

The council started with a brief review of the last ticket discussed at the end of the preceding day, to bring LR up to date on its resolution.

3.1.1 Amber Tickets

dcr:valueDatcat attribute to global attributes, but instead we will create an attribute class called att.datcat for it, and add that class to elements only in response to specific requests. Initially, all the members ofmodel.gramPartand gram will get the attribute. Set to Pending and assigned to PB.

lg requires that there be more than one l in a lg. Others (LR, MH) believe that there are good use-cases for a single l inside

lg. Assigned to LR to gather and express arguments on both sides of the issue.

sense, but gram is allowed only as a child of

form.) (Submitter would like it added to sense as well in keeping with feature request 3266021 and in keeping with section 9.3.2, which implies that you can encode grammatical information either way.) Assigned to KH to implement; he will then open a deprecation ticket, on which we'll eventually act.

catRefs, because multiple targets can be cumbersome. KH was asked to propose a wording for this, and the ticket was assigned to him.

placeName/ geogName. JC, LB: There is some rationale behind the absence of these attributes from geogName, but it doesn't really hold up. The content model for placeName is a bit suspect (e.g.

dimensions), but that's a different issue. The Council agrees to give

geogName the same attribute set as placeName, and then raise a second ticket to deal with the problems in that set of attributes. Assigned to LB.

listObject and objectName, and accompanying details and Guidelines text, and submit them to Council for consideration. The ticket will is assigned to SR and left open.

ref and key on it. SY objects on the basis that RDF can be embedded in TEI anyway, but the submitter and the SIG are strongly in favour. The submitter strongly recommends renaming the active and passive attributes to subject and object, and removing

mutual, but Council feels it's not worth breaking backward compatibility just to rename attributes. The SIG will be asked to raise a new ticket if they feel very strongly about the naming issue. relationGrp will also be added to model.biblLike. Assigned to SR to implement, and set to Pending and green.

3.1.2 Other Issues

EP asked that we create a link on the TEI site to a filtered list of SF tickets including only the deprecation tickets, so people can easily see what is currently deprecated. MH suggested there be two lists, one showing open deprecation tickets and one showing closed (because enacted) deprecation tickets. LB opened a ticket for this, and assigned it to JC.

3.2 Afternoon

3.2.1 Discussion Topics

EP: We could try to use equiv to make element aliasing work, and then rationalize the names; we could push this problem forward to P6, and develop an explicit strategy for naming these things; or we could throw up our hands and do nothing. PB: We should fix transposeGrp while we can. We should develop a clear set of naming principles for these things. LB: There is a general section in the Guidelines on naming of elements. A description should go there. LR: It should go in tagDocs because people use that when writing ODDs. LB: All but a handful of the *Grp elements are syntactically correct as groups; the exceptions are

transposeGrp, which we've agreed to change, forestGrp, and

relationGrp. LB will raise a ticket for this.

type issue (enable the overriding of an attribute description inherited from a class). We should have a traditional class inheritance structure for e.g. content models. Finally, we should derive modules from this class inheritance.

3.2.2 SIG Report from Linguistics SIG (PB)

PB has sent to the Council a report on the TEI/linguistic bibliography list, created and maintained in the context of LingSIG. Action for JC: Before Council meetings, email the SIG lists to ask for reports or issues to discuss.

3.2.3 Thanks to Laurent Romary

All members of the Council expressed their sincere gratitude and admiration for the leadership and vision Laurent has provided over many years as Council chair. He was presented with a card and suitable token of our appreciation in liquid form.

3.2.4 Priorities for the Coming Year

We should have regular telcos every two months, and deal with tickets at all of them. Telcos should not be more than an hour.

The next ftf will take place mid-to-late April 2012. The date will be finalized before the end of the year.

The strategy for Roma is a priority (see above). We might be able to prod someone with the right skill-set to apply for a grant from the Board to do it; calls for applications will go out from the Board soon.

We should create a SIG for the ideas around the future of ODD, and invite likely experts and interested community members to join it.

Solve all outstanding problems regarding conversion from EEBO/ECCO.

Investigate methods of allowing element and attribute name aliasing, so that we could rename elements without breaking backward compatibility.