Insights, which emerged from research at ZAS, are not only transferred into the non-academic society, but also into other research communities. Several projects resulted in the development of tools and data bases which can be used by external researchers to advance their own projects.
The Multilingual Assessment Instrument for Narratives (MAIN) was designed under the leadership of Prof Natalia Gagarina, head of the Research Area Language Development & Multilingualism, to assess the narrative skills of children who are born or have grown up learning one or more languages. MAIN is suitable for children from 3 to 10 years of age and evaluates both understanding and production of narratives. MAIN is currently available about 90 languages and is used in more than 60 countries, not only by speech and language therapists, physicians and teachers, but also by researchers worldwide as a basis for their studies. MAIN is also used for empirical studies at ZAS and is constantly being further developed.
In recent yearse, a worldwide network of about 2300 members in 65 countries has emerged around MAIN, whose activities and public relation measures are coordinated by the Research Area Language Development & Multilingualism at ZAS.
The database of clause-embedding predicates documents the clausal complementation patterns of lexical predicates in several languages, including multiple historical stages of German. The focus is on collections of selected corpus examples, which for each predicate show different embedding types, and which are coded according to relevant grammatical properties. The contemporary German part of the database is made available to the public as a public beta on the OWIDplus platform in cooperation with the Leibniz Institute for German Language in Mannheim. It currently contains 1793 clause-embedded predicates with 16804 examples (as of February 2022).
Within the French-German collaborative research project DoReCo a multilingual reference corpus of language documentation data (DoReCo) was created and published in July 2022, consisting of audio recordings and annotations of mostly narrative texts, with around 10,000 words for each language. All data have been transcribed, translated into a major language, and time-aligned at the level of discourse units by experts on the languages, and a subset of 36 of these are additionally annotated for morpheme breaks and morpheme glosses. Within DoReCo, transcriptions have been time-aligned at the phoneme level. DoReCo subcorpora and annotations are treated as citable publications, provided with a permanent identifier and associated with a CC BY 4.0 license. DoReCo will be a valuable platform for easy access to data from over 50 languages for cross-linguistic research on spoken language, contributing to open, reproducible science regarding global linguistic diversity and cultural heritage.