Language Documentation

What Is Language Documentation and How Does It Work?

Data Orientation

The core of language documentation at the CELD are annotated audio- and videotaped communicative events covering a broad range of interactive domains. All the major types of communicative events which occur in a speech community are tabulated as completely as possible. These would typically include: everyday conversation, ritual speech, procedural texts on major activities, oral history, autobiographic knowledge as well as interactions with speakers from neighboring speech communities. As much as possible, specimens for each type of communicative event are recorded for the documentation

In addition to this core of annotated recordings, language documentations at the CELD ideally contain the following components:

  1. fieldwork details (team members, outline of general procedures, the methods used in gathering data, reliability)
  2. a description of the linguistic setting, including information on the genetic affiliation of the language
  3. a detailed discussion and explanation of the orthographic representation, interlinear glosses, and translation conventions, which will be presented partly in a grammatical sketch
  4. major typological characteristics and a detailed account of the sociolinguistic setting (dialects, language contact)
  5. general information on the speech community (e. g. social organization, geography, history) including cross-references and links to more detailed work in the areas of anthropology, history, and economics
  6. lexical database


CELD language documentations are multifunctional. Firstly, the content of the documentations are of interest and use to a number of different interest groups. Secondly, the documents are presented in a format accessible to these different groups. For both features it is essential for the potential interest groups to have a say in the make-up and format of the documentation. Most notably, all decision making, consultation, and experimentation is carried out in a participatory fashion with the community members.

Archiving & Accessibility

The fully processed data is submitted to digital archives for endangered languages, for example The Language Archive (formerly DoBeS archive) at the Max Planck Institute in Nijmegen (DoBeS), the Netherlands, or PARADISEC, Australia. Sensitive items will be blocked from general usage in accordance with the wishes of contributors and speech community. In addition to submitting the fully processed data to a digital archive, the documentation will also be available at CELD in Manokwari and, if desired, locally in the speech community in a format determined by the contributors.