Motivation

During the last decades Digital Humanities evolved dramatically, from simple database applications to complex systems involving most recent state of the art in Computer Science. Especially Language Technology plays a major role either for processing the metadata of recorded objects or for analyzing and interpreting content.

Applying language technology methods to objects from humanities is a challenge for NLP-research: data is heterogeneous (image /text), often incomplete (e.g. OCR errors), multilingual within one document (historic documents with Latin or /and classical Greek paragraphs) and difficult to structure (paragraphs, titles, pages are somewhat different in historical texts).

Corpus-based methods, nowadays standard in NLP research cannot be often applied as the necessary large training data is missing. Moreover requirements of tools for digital humanities, especially such tools dedicated to cultural heritage objects are different from those for tools applied to modern texts.

Thus performing research in Digital Humanities involves also adapting existent NLP Tools for historical variants of languages, developing tools for new languages, making tools robust for syntactic deviation and adapting semantic resources. Central and Eastern Europe was always characterized by a high concentration of languages and cultures. Unfortunately, especially here many historical documents are in bad condition; many languages or dialects became extinct over the time and their written evidence is rare.

Thus performing research in Digital Humanities involves also adapting existent NLP Tools for historical variants of languages, developing tools for new languages, making tools robust for syntactic deviation and adapting semantic resources. Central and Eastern Europe was always characterized by a high concentration of languages and cultures. Unfortunately, especially here many historical documents are in bad condition; many languages or dialects became extinct over the time and their written evidence is rare.