Automatic Grading of XML-TEI Markup Training Assignments (paper)
Alejandro Bia* Alejandro Bia has a PhD in Computer Science from the University of Alicante, a MSc and a BS in Computer Science from ORT University, a Diploma in Computing and Information Systems from Oxford University, and a diploma of Expert in Technological Innovation in Education from the Miguel Hernández University.Currently he is a full time lecturer and researcher at the Miguel Hernández University. He has been teaching XML-TEI encoding in several places, as for instance: the European Summer School (“Culture & Technology”) at the University of Leipzig (2009–2017), the Cultural Heritage Digitization Course at FUNED (2013–2017) and the Master in Digital Humanities (2005–2011) at the University of Castilla La Mancha. From 1999 to 2004, he has been Head of Research and Development of the Miguel de Cervantes Digital Library at the University of Alicante, the biggest digital library of Spanish literary works and one of the first projects to use TEI in XML format. His lecture topics are: text markup using XML and TEI, software engineering, project management, computer forensics, information security and web application design.
1The present work aims at showing that, in a restricted and well-defined domain, the automatic grading of practical XML-TEI markup assignments can be reliably performed.
2In the cases presented, the assignment deliverables are XML files. These XML files can be processed with XSLT/XPATH or by means of text-search utilities like FIND or GREP, to search and count desired or undesired patterns, and can also be compared to an ideal solution to obtain an edit-distance value that can be used as a similarity score.
3These methods have been applied to two types of task: encoding digital documents using XML-TEI, and project planning using GanttProject (see http://www.ganttproject.biz/). The latter included the development of a Gantt chart, a PERT diagram and a human resources allocation diagram. In both cases, the deliverable is just an XML file (GantProject .gan files are XML), which allows similar grading and feedback methods to be applied.
4Every now and then, there are cases of plagiarism, so similarity checks serve to automatically detect XML submissions which are suspiciously similar (sometimes almost identical), which could passed unnoticed for a human grader. In markup assignments, maintaining the integrity of the original text is essential. Similarity checks serve to detect modified or mutilated texts, or the delivery of a completely different text borrowed from somewhere or somebody else.
5In some of the techniques used, it was necessary to normalize the format of XML texts before applying the scripts, so that irrelevant differences, like line-breaks and repeated blank spaces, could be eliminated. There are many pretty-print formatters for XML that can do this job.
6During the last 5 years, hundreds of homework submissions from three different courses have been graded and reviewed using these methods, and different variants of the methods were tested allowing techniques to be refined. We realized that the results of machine grading are equal or, in some cases, better than human grading, although some unusual cases require human intervention (borderline scores, unexpected features detected, etc.), but these cases are highlighted by machine grading. To verify reliability, both automatic and handmade grading were performed and the results compared.
7With these techniques, student works can be sorted from worst to best in a fairly precise manner, avoiding the possible subjective judgement and rounding errors of human grading. In addition to the numerical scores obtained, some scripts allow the detection of errors in the submitted files and are able to generate automatic reports to be used as feedback to students. These feedback messages may not be as accurate, verbose and complete as those written by a human reviewer, but they effortlessly give good clues to the origin of the errors.
8The main disadvantages of these machine grading methods are the cost (time and effort) and the complexity of assembling and testing the grading scripts, particularly those which have to be customized for each specific exercise. These methods require programming knowledge in parsing languages and familiarity with comparison tools and search patterns, as the scripts have to be tailored to each particular assignment. This preparation effort makes the approach inadequate for a small number of assignments. Some tests, however, like plagiarism detection by computing the edit-distance as a form of similarity-measurement, can be reused without additional tuning.
9On the other hand, the advantages are: the speed of grading, the possibility of exporting data to a spreadsheet and adjust the calculation of the final grades by using weights, the rapid machine generation of feedback reports with suggestions for the students, and the reuse of the scripts for the correction of other similar exercises (in some cases requiring minor changes, parameter settings or partial reprogramming).
10This approach is suitable to quickly grade a large number of simple, introductory assignments, for which there is a predictable “ideal” solution that supports only minor valid variations, like in the case of clearly structured texts containing prose, verse and drama, but is not suitable for cases in which there may be legitimate disagreement about the way some parts of the text should be encoded. For us, it proved to be a very helpful tool for massive introductory XML-TEI courses.
11As future work, we plan to explore the automatic generation of assignment grading scripts from a given sample solution, aiming at tackling the main disadvantage of the method, the grading scripts preparation cost.