With this special issue of the Journal of Data Mining and Digital Humanities (JDMDH), the goal is to bring together in one single volume several experiments, projects and reflections related to automatic text recognition on Historical documents.
Many projects now include automatic text acquisition in their data processing chain. The integration of this technology into increasingly powerful processing chains has led to an automation of tasks that affects the role of the researcher in the textual production process. This new data-intensive practice makes it urgent to collect and harmonize the corpora necessary for the constitution of training sets, but also to make them available for exploitation. This issue will be an opportunity to propose articles combining philological and technical questions to make a scientific assessment of the use of automatic text recognition for ancient documents, its results, its contributions and the new practices induced by its use in the process of editing and exploring texts. This special issue is an opportunity to question the practical aspects, while raising methodological challenges and its impact on research data.
This special issue is the outcome of an event that took place at the Ecole Nationale des Chartes in Paris on June 23 and 24, 2022, which brought together scholars from various backgrounds to discuss the use of HTR and OCR in their researches. During these days, problems of engineering, machine learning or infrastructure were raised. Many technical subjects such as segmentation or the development of models linked to philological questions were discussed. The different speeches covered a large number of documents: manuscripts, archives, epigraphic materials, documents, sometimes in languages with their own specificities such as Hebrew, Vietnamese languages as CHAM or ancient Greek from the 11th to the 20th century.
This call is open not only to participants of these event, but to anyone working with HTR or OCR.
To address these issues, the following three axes are suggested:
– Axis 1: Sources, constitution and sharing of training data.
– Axis 2: Machine learning
– Axis 3: Feedback and data exploitation
Journal of Data Mining and Digital Humanities is an open-access peer-reviewed journal with first draft as pre-print in arxiv or HAL and peer-review post-publication.
Submission and deadlines
The papers are expected to be between 6 and 8 pages for short paper or between 12 and 15 pages for long papers.
The articles must present original and previously unpublished work.
All submissions must be in english
All the articles submitted are subject to blind peer-review in accordance with the journal’s editorial policies.
Submission deadline: 1 November 2022, Extension of the deadline to 1st of December
For more details on the submission process, click here.
Contact : firstname.lastname@example.org