Quantitative approaches to historical texts: should you care about OCR?

This DH Research Seminar from EPFL’s Digital Humanities Institute, to be held online, will be given by Simon Hengchen of the University of Gothenburg.

The DH Research Seminar is a series of talks organized by the Digital Humanities Institute in EPFL’s College of Humanities. The seminars are given by researchers from a wide range of backgrounds, and present the vast array of subjects covered by the field of digital humanities.

Quantitative methods for historical text analysis offer exciting opportunities for researchers interested in gaining new insights into long studied texts. However, the methodological underpinnings of these methods remains under-explored. In the first part of the talk, Hengchen will show and discuss, through the use of a case study, the effect the OCR process has on a range of quantitative text analyses. In the second part of the talk, he will present a novel and totally unsupervised OCR post-correction method on the same dataset, as well as its most recent evolution on a highly-inflected language.

Participants are invited to listen to the seminar, and to join in the Q&A session at the end of the presentation, via the following link:

About the speaker

Simon Hengchen is a researcher in NLP at the University of Gothenburg, where he works within the Language Change project. His main research focus is lexical semantic change in multilingual, unstructured, OCRed, historical textual data, but he is also interested in NLP for DH. Simon is also a part-time lecturer in DH at the University of Geneva.

From: 18 Nov, 2020
To: 18 Nov, 2020
Organizer: EPFL Digital Humanities Institute
Speaker: Simon Hengchen
Languages: English