TLAThe Language Archive

Contact details:

Paul Trilsbeek
Wundtlaan 1, Nijmegen

The Language Archive (TLA) contains audio and video recordings, texts as well as related materials on more than 200 different languages that are spoken around the world. A large part of the archive concerns unique materials from endangered languages that were collected within the DOBES (Dokumentation bedrohter Sprachen) programme funded by the Volkswagen Foundation. 64 collections within TLA containing materials from 102 different languages are recognized by UNESCO as Memory of the World. TLA’s collections are stored in a state of the art repository system that conforms to current best practices and has acquired the Data Seal of Approval.

The Language Archive (TLA) contains unique materials on more than 200 different languages spoken around the world. The technical infrastructure of the archive as well as the data collections it contains have been built up over the past 15 years with significant investments from the MPI for Psycholinguistics and numerous research funding bodies such as the Volkswagen Foundation, the European Commission, the Dutch KNAW and the German BBAW, BMBF and DFG. The technical infrastructure of TLA consists of a repository solution and various data exploitation tools around it. The repository conforms to current best practices with respect to long-term preservation of research data, which is demonstrated by the awarded Data Seal of Approval and World Data System membership. In addition, TLA is one of the certified centres of the CLARIN European research infrastructure.
The wide variety of data types contained in TLA allow for a wide range of possible research questions that could be answered by making use of the data. Language corpora containing audio-visual and textual materials can for example be used to investigate language use within a specific language or to compare certain aspects of languages across a range of languages. Video recordings of spoken language can be used to investigate the multi-modal nature of language including gesticulation, facial expression, and body movement. Audio-visual recordings of a certain culture can be used for anthropological research. Longitudinal language corpora with recordings from children acquiring a language or adults learning a second language at different stages of the process can be used for language acquisition studies. Besides these obvious use cases of the data, in particular the language corpora of endangered languages are very rich sources that contain a lot of information about the culture and the environment in which those languages are spoken.
In addition to audio-visual recordings of languages and textual language materials, TLA also contains data sets from studies using brain imaging techniques to investigate what happens in the brain when humans produce or perceive language, as well as data sets from studies looking at human or animal genome in relation to certain language-related impairments.

Connection to strategic developments
Topsectors:
ESFRI:
Faciliteit
Sociale en culturele innovatie
NWA-Routes: