Language Documentation and Archiving

The Research Theme Language Documentation and Archiving intents to further develop the expertise in the area of state-of-the-art documentation of small languages and theory-driven, cross-linguistic research based on existing corpora at ZAS. In this context we are developing a strategy for long-term archiving of linguistic corpora on small and often endangered languages in close collaboration with the IDS Mannheim and other corpus initiatives in Germany and internationally. We also work together with the ELAR archive (SOAS London; director: Dr. Mandana Seyfeddinipur). This Research Theme closely collaborate with other Research Themes at ZAS in order to, on the one hand, address also state-of-the-art archiving of other types of linguistic research data; and, on the other hand, to optimize the accessibility of language documentation corpora on small languages for a variety of theoretical research questions.

This Research Theme is supported by several third party funded projects, for example the DFG-Heisenberg Grant New empirical linguistics through integration of language documentation, comparative corpus linguistics, typology and language contact research (Seifart). Furthermore, the German/French DFG-ANR project DoReCo - Cross-linguistic phonetics and morphology using a time-aligned multilingual reference corpus built from documentations of 50 languages: Big data on small languages provides a proof-of-concept for the development of reference corpora in small languages (Seifart, Krifka, Fuchs, Paschen). The BMBF project QUEST Quality - Established: Testing and application of curation criteria and quality standards for audiovisual annotated language data (Krifka, Seifart, Seyfeddinipur, von Prince, Nordhoff) (with the Universities of Hamburg and Cologne and IDS Mannheim, in cooperation with MPI-TLA Nijmegen and ELAR-SOAS London) develops criteria for corpora that are easily usable for linguistics and beyond, e.g. for museums and for speech communities. And, within the CRC 'Register' there is the project Speaker's choices in creole contexts: Bislama and Morisien (Krifka and Veenstra) that involves documentation of two creole languages, Morisien in Mauritius and Bislama in Vanuatu.