Speech rate and pauses provide a window into the neural-cognitive and physiological-articulatory bases of the human language production system, but little has been studied about cross-linguistic variation in this domain. This project fills this gap with its comparative study of spontaneous spoken language using a diverse sample of 50 languages. For this purpose, we created a multilingual reference corpus of language documentation data (DoReCo) consisting of annotations and associated audio recordings that are archived at repositories such as The Language Archive (TLA), especially from the DOBES collection. DoReCo will be built from data that are transcribed, translated into a major language, and time-aligned with audio files at the discourse unit level.
Within the current project, these data will be time-aligned at the phoneme level. We have identified at least 50 languages, from which corpora of at least 10,000 words can be included in DoReCo, as well as a subset of at least 30 of these, which have already been additionally annotated for morpheme breaks and morpheme glosses. In DoReCo, subcorpora and annotations are treated as citable publications, provided with a permanent identifier and associated with a CC BY 4.0 license. DoReCo will have a lasting effect beyond the specific research goals of the DoReCo project, as a platform for cross-linguistic research on spoken language that provides easy access to over one million words of annotated corpus data from over 50 languages. This represents an unprecedented contribution to open, reproducible science addressing global linguistic diversity and cultural heritage. Both of DoReCo’s two specific research goals address the universality of constraints on human language that arise from species-wide articulatory and cognitive properties:
The project will be carried out by an interdisciplinary team bringing together expertise on documentary linguistics, phonetics, typology, and quantitative linguistics, with strong institutional support from two leading research centres in Germany and France.
External Webpage: http://doreco.info