Projects  >  Member Projects

Towards Computational Historiographical Modeling: Corpora and Concepts

So far, digital humanities has largely contented itself with borrowing methods from other fields and has developed little methodology of its own. In our Spark pilot project An Agile Approach Towards Computational Modeling of Historiographical Uncertainty we have shown that the almost exclusive focus on methods and tools represents a major obstacle towards the construction of computational models that could help us to obtain new insights into humanities research questions (which are ultimately qualitative, why? questions) rather than just automate primarily quantitative processing.

In the proposed project we therefore want to focus on two issues we have identified as particularly pressing, and which together constitute a critical research gap:

  1. Regardless of the application domain, digital humanities research tends to rely heavily on corpora, i.e., curated collections of texts, images, music, or other types of data. However, the epistemological implications have so far been largely ignored. We propose to consider corpora as phenomenotechnical devices (Bachelard 1968), like scientific instruments: corpora are, on the one hand, models of the phenomenon under study; on the other hand, the phenomenon is constructed through the corpus. We therefore want to study corpora as models to answer questions such as: How do corpora model and produce phenomena? What are commonalities and differences between different types of corpora? How can corpora-as-models be formally described in order to take their properties into account for research that makes use of them?
  2. Models of complex phenomena generally rely heavily on numerous concepts, e.g., (in history) textuality, feudalism, state, class, etc. Such concepts are effectively references to “submodels,” which serve as building blocks for larger models. Traditionally, these models were largely implicit and not formalized. This becomes a serious epistemological problem in digital humanities, because these concepts are the foundation for selecting data and building corpora. For example, a corpus of letters is based on the concept of “letter” (as distinct from other writings), or a data set for comparing some aspect of preliterate and literate societies is based on the concept of “literacy” (as distinct from “illiteracy”). The lack of a formalization of these concepts is currently a major weakness of computational research in the humanities: while the quantitative computational analyses are highly formalized, their qualitative foundations are shaky. In a case study, we will investigate concepts as models: How do they function and how are they used? Are there structural similarities that would allow us to create a metamodel for formalizing concepts?


The project will examine these issues in a historical context, but these are general issues in digital humanities, and we envision the results to be transferable to other contexts. We expect the project to make an important contribution to theory formation and help advance the digital humanities from project-specific, often ad hoc, solutions to particular problems to a more general understanding of the issues at stake.

Michael Piotrowski (PI)
Floor Koeleman (Postdoc)
Lampros Ntoumas (PhD student)

Project duration

This project is funded by the SNSF (grant no. 105211_204305).

The UNIL-EPFL dhCenter ceased its activities on December 31, 2022. The contents of this site, with the exception of our members' pages, are no longer updated. Thanks to all of you for having kept this space alive! More information