Volume 3 (2009)

Online Dictionary and Ontology Building for Austronesian Languages in Taiwan

D. Victoria Rau, Meng-Chien Yang, Hui-Huan Ann Chang, and Maa-Neu Dong

This paper provides a model of language documentation and conservation in Taiwan to illustrate how online dictionaries have been produced by a collaborative team, and how technology has been used in the process to create a formalized model of existing indigenous knowledge. Our interactions with the Yami community over the past decade have led us to believe that a cooperation framework involving three groups of experts provides necessary “scaffolding” before an “egalitarian” wiki style of online dictionary or ontology building can be attempted. In addition, ontology building requires triangulation of various sources of human interpretations. It is not possible to build an ontology only based on sophisticated machine reasoning. We hope this model of collaboration can serve as a feasible model for other projects in language revitalization and capacity building in the future.

Documentation and Language Learning: Separate Agendas or Complementary Tasks?

Norbert Francis and Pablo Rogelio Navarrete Gómez

In the indigenous communities of the Malintzin volcano highlands in Mexico, in the border region of the states of Puebla and Tlaxcala, speakers of Nahuatl have responded variously to the displacement of their language. In a few localities, evidence of a significant erosion appears to have sparked increased interest in both documentation (e.g., preserving a record of extant traditional narrative) and second language learning of the indigenous language by first language speakers of Spanish, and by speakers of Spanish who were once fluent speakers of Nahuatl. Modest interest has been expressed in bilingual instructional models for public schooling for children who are first language speakers of Nahuatl. Even though a small number of towns in this region have maintained high levels of Nahuatl language proficiency across the population (approaching ninety percent in two cases) continued and most likely accelerated erosion in the coming years appears to be inevitable. All demographic and sociolinguistic indicators point in this direction. We report on advances that have been made in a project that seeks to combine the tasks of Documentation and Language Learning. The following argument is presented for wider discussion: that in fact there are no inherent conflicts of interest between scientists (internal and external to the speech community) and indigenous communities as a whole regarding the goals of language maintenance, language use, and research projects related to recording and preserving an archive of the language and its various discourse forms.

Relatively Ethical: A Comparison of Linguistic Research Paradigms in Alaska and Indonesia

Gary Holton

Just as there is no single model for community-based research, ethical standards for community engagement are not universal. Drawing from personal experiences with language documentation among threatened communities in two very different parts of the world, this paper examines the challenges of applying universal ethical guidelines for linguistic fieldwork.

Five Dimensions of Collaboration: Toward a Critical Theory of Coordination and Interoperability in Language Documentation

Akiemi Glenn

In the literature on best practices of language documentation, “collaboration” has emerged as an important concept. While collaboration between scholars is not usually the norm in linguistics, a theory of language documentation must grapple with its theoretical orientation to collaboration. By reviewing the practices of researchers in other disciplines, this paper identifies five aspects of academic collaboration—coordination, distribution of labor, standards for interoperation, authorship and authority, and feedback—that have special bearing on the enterprise of language documentation. I investigate these as a starting point for linguists and our collaborators to consider critically what documentation project and for the discipline of linguistics.

Phoenix or Relic? Documentation of Languages with Revitalization in Mind

Rob Amery

The description of Indigenous languages has typically focussed on structural properties of languages (phonology, morphology, and syntax). Comparatively little attention has been given to the documentation of language functions or the most commonly occurring speech formulas. Speech formulas are often culturally-specific and idiomatic and cannot be reliably reconstituted from a knowledge of grammar and lexicon alone. Many linguists and lexicographers seem to have an implicit relic view of language, as if they have been trying to capture the “pure” language uncontaminated by language and culture contact. Accordingly, borrowed terms and neologisms are typically omitted or underrepresented in dictionaries. Recorded texts have tended to be myths or texts about traditional culture. Conversations and texts about everyday life, especially in non-traditional contexts, are ignored. How can we ensure that language descriptions are maximally useful, not only to linguists, but to the people most closely associated with the languages, who may wish to revive them? Considerable time is needed to produce a maximally useful description of a language and its uses. Suggestions made here emerge from first-hand experience working with Yolngu and Pintupi people in non-traditional domains, as well as from attempts to re-introduce Kaurna on the basis of nineteenth-century documentation.

A Psycholinguistic Tool for the Assessment of Language Loss: The HALA Project

William O’Grady, Amy J. Schafer, Jawee Perla, On-Soon Lee, and Julia Wieting

A major obstacle to the early diagnosis of language loss and to the assessment of language maintenance efforts is the absence of an easy-to-use psycholinguistic measure of language strength. In this paper, we describe and discuss a body-part naming task being developed as part of the Hawai‘i Assessment of Language Access (HALA) project. This task, like the others in the HALA inventory, exploits the fact that the speed with which bilingual speakers access lexical items and structure-building operations in their two languages offers a sensitive measure of relative language strength. In a pilot study conducted with Korean-English bilinguals, we were able to establish a strong correlation between language strength and naming times even in highly fluent bilingual speakers, in support of the central assumption underlying the HALA tests. We discuss the implications of this finding for the broader study of language strength as well as for the practical problems associated with work on language loss, maintenance, and revitalization.

Data Processing and its Impact on Linguistic Analysis

Anna Margetts

The Saliba-Logea documentation project has been working toward a web-based text database with text-audio linkage and searchable annotations. In this article, I discuss the impact that the nature of data processing can have on linguistic analysis, and I demonstrate this on the basis of two research topics: the positioning of Postpositional Phrases and the distribution of plural markers. Saliba-Logea PPs can be ambiguous as to whether they belong to the preceding or following clause. To investigate whether there is a correlation between a PP’s position and its semantic role, text-only transcriptions turn out to be insufficient. The second question relates to the Saliba-Logea plural suffix, which originally occurred only on nouns with human referents. However, some speakers use it in novel contexts, and in order to investigate these extended uses and who drives them, access to metadata about the speakers is required. I show that text-audio linkage can be a prerequisite for analyzing syntactic constructions and that access to metadata can have a direct effect on the linguistic analysis.

Using Toolbox with Media Files

Andrew Margetts

This article focuses on our documentation project’s use of Toolbox with media files, i.e., the source audio/video material that our transcripts are based on: why we set things up the way we do, and how. The process begins with an appropriate media file. This is marked up in Transcriber to produce a series of time-aligned annotations containing transcripts and speaker names, which correspond to intonation units in the recording. The resulting file is converted to a text format that can be used natively in Toolbox and easily imported into ELAN. The article also covers techniques for managing and querying the resulting data, both within Toolbox and with spreadsheets and relational databases. Further, it discusses some other language-oriented programs (especially Transcriber and ELAN) insofar as they affect our use of Toolbox. When Toolbox is used in close conjunction with source media files, it becomes particularly powerful. Some common tasks become easier, and new types of enquiry are possible. This is largely the result of Toolbox’s ability to play discrete segments from a sound file. There is no single established methodology for creating such a conjunction, and there are a multitude of possibilities for using the results. This paper offers one account.

Research Models, Community Engagement, and Linguistic Fieldwork: Reflections on Working within Canadian Indigenous Communities

Ewa Czaykowska-Higgins

This paper reflects on different research models in linguistic fieldwork and on different levels of engagement in and with language-speaking communities, focusing on the Canadian context. I begin by examining a linguist-focused model of research: this is language research conducted by linguists, for linguists; the language-speaking community’s participation is limited mostly to being the source of fluent speakers, and the level of engagement in the community by a linguist is relatively small. I then consider models that involve more engaged and collaborative research, and define the Community-Based Language Research model which allows for the production of knowledge on a language that is constructed for, with, and by community members, and that is therefore not primarily for or by linguists. In CBLR, linguists are actively engaged partners working collaboratively with language communities. Collaborative models of research seem to be closest in spirit to models advocated by Indigenous groups in Canada and elsewhere. I reflect here on (1) why one might choose to work within a collaborative research model, and (2) what some of the challenges are that linguists face when they conduct research collaboratively. In a broad sense the purpose of this paper is to think through some questions that an “outsider” linguist might face when undertaking linguistic research in an Indigenous community today.

Kaipuleohone, the University of Hawai‘i’s Digital Ethnographic Archive

Emily E. Albarillo and Nick Thieberger

The University of Hawai‘i’s Kaipuleohone Digital Ethnographic Archive was created in 2008 as part of the ongoing language documentation initiative of the Department of Linguistics. The archive is a repository for linguistic and ethnographic data gathered by linguists, anthropologists, ethnomusicologists, and others. Over the past year, the archive has grown from idea to reality, due to the hard work of faculty and students, as well as support from inside and outside the Department. This paper will outline the context for digital archiving and provide an overview of the development of Kaipuleohone, examining both concrete and theoretical issues that have been addressed along the way. The creation of the archive has not been problem-free and the archive itself is an ongoing process rather than a finished product. We hope that this paper will be useful to scholars and language workers in other areas who are considering setting up their own digital archive.