Note that the embedded audio is best played in Adobe Reader
The endangered state of Negidal: A field report
Brigitte Pakendorf & Natalia Aralova, pp. 1-14
Negidal is a Northern Tungusic language closely related to Evenki with two recognized dialects, Upper and Lower Negidal. This nearly extinct language used to be spoken in the Lower Amur region of the Russian Far East by people whose traditional way of life was based on fishing and hunting. While the number of remaining active speakers of Upper Negidal was more or less known, the current state of Lower Negidal was still uncertain. We here report on a trip to ascertain the state of Lower Negidal and give a precise assessment of the linguistic situation of both dialects. While the Upper dialect is still represented by seven elderly female speakers, varying in proficiency from fully fluent to barely able to produce a narrative, not a single active speaker of Lower Negidal is left. The language will therefore probably be extinct in the next decade or two.
Orthography development for Darma (The case that wasn’t)
Christina Willis Oko, pp. 15–46
As the discipline of language documentation and description evolves, so do the expectations placed on researchers. Current trends emphasize collaborative efforts that prioritize tangible contributions to the community, such as a pedagogical grammar, dictionary, or collection of texts. Some argue that for unwritten languages orthography development is imperative so that materials prepared by the researcher (perhaps in collaboration with the community) are accessible to speakers. In light of the current discussions of methodology and ethical issues related to endeavors to document and describe the world’s languages, this paper explores the challenges faced by a single researcher (the author) working on a single language (Darma) within a multilingual setting (in India). This project emphasizes ethnographic and discourse-centered research methodologies which reveal language ideologies that are discussed here to demonstrate that while orthography development is a reasonable objective in many cases, one must be sensitive to a variety of interconnecting issues including history, social relationships, language ideology, and local politics associated with writing and education. While orthography development has not been a viable option in the Darma Documentation and Description Project, it is nevertheless a matter that needs to be addressed for the benefit of the community as well as ongoing discussions of methodology and best practices in linguistic and anthropological research.
Review of Tone in Yongning Na: Lexical tones and morphotonology (Studies in Diversity Linguistics 13)
Maria Konoshenko, pp. 47–52
Contact languages around the world and their levels of endangerment
Nala H. Lee, pp. 53–79
This paper provides an up-to-date report on the vitality or endangerment status of contact languages around the world, including pidgins, creoles, and mixed languages. By utilizing information featured in the Endangered Languages Project and the Atlas of Pidgin and Creole Languages online portals, 96 contact languages are assessed on the Language Endangerment Index, a method of assessment that is based on four factors including intergenerational transmission, absolute number of speakers, speaker number trends, and domains of use. Results show that the contact languages are most at risk with respect to intergenerational transmission and domains of use. This is explained by the social and historical nature of contact languages. Overall results further raise the concern that the proportion of pidgins, creoles and mixed languages at some level of risk is extremely high. Reasons are provided for why linguists should be concerned about the endangerment of these languages.
Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data
Lisa M. Johnson, Marianna Di Paolo & Adrian Bell, pp. 80–123
Automated alignment of transcriptions to audio files expedites the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages, for which large corpora exist and for which acoustic models have been created by large-scale research projects. Prosodylab-Aligner (PL-A), from McGill University, facilitates automated alignment and segmentation for understudied languages. It allows researchers to train acoustic models using the same audio files for which alignments will be created. Those models can then be used to create time-aligned Praat TextGrids with word and phone boundaries marked.
For the benefit of others who wish to use PL-A for research projects, this paper reports on our use of PL-A on Tongan field recordings, reviewing the software, outlining required steps, and providing tips. Since field recordings often contain more background noise than the laboratory recordings for which PL-A was designed, the paper also discusses the relative benefits of removing background noise for both training and alignment purposes. Finally, it compares acoustic measures based on various alignments and compares boundary placements with those of human aligners, demonstrating that automated alignment is both feasible and less time-consuming than manual alignment.
Kratylos: A tool for sharing interlinearized and lexical data in diverse formats
Daniel Kaufman & Raphael Finkel, pp. 124–146
In this paper we present Kratylos, at www.kratylos.org, a web application that creates searchable multimedia corpora from data collections in diverse formats, including collections of interlinearized glossed text (IGT) and dictionaries. There exists a crucial lacuna in the electronic ecology that supports language documentation and linguistic research. Vast amounts of IGT are produced in stand-alone programs without an easy way to share them publicly as dynamic databases. Solving this problem will not only unlock an enormous amount of linguistic information that can be shared easily across the web, it will also improve accountability by allowing us to verify analyses across collections of primary data. We argue for a two-pronged approach to sharing language documentation, which involves a popular interface and a specialist interface. Finally, we brieﬂy introduce the potential of regular expression queries for syntactic research.
Single-event Rapid Word Collection workshops: Efficient, effective, empowering
Brenda H. Boerger & Verna Stutzman, pp. 147–193
In this paper we describe single-event Rapid Word Collection (RWC) workshop results in 12 languages, and compare these results to fieldwork lexicons collected by other means. We show that this methodology of collecting words by semantic domain by community engagement leads to obtaining more words in less time than conventional collection methods. Factors contributing to high and low net word senses are summarized, addressed, and suggestions given for increasing effectiveness of the RWC procedures. Relevant points are illustrated in detail using a 2015 Natügu [ntu] RWC workshop in the Solomon Islands. We conclude that the advantages of the single-event RWC workshop strategy warrant recommending it as best practice in lexicographic fieldwork for minority languages.
A Guide to the Syuba (Kagate) Language Documentation Corpus
Lauren Gawne, pp. 204-234
This article provides an overview of the collection “Kagate (Syuba)”, archived with both the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) and the Endangered Language Archive (ELAR). It provides an overview of the materials that have been archived, as well as details of the workflow, conventions used, and structure of the collection. It also provides context for the content of the collection, including an overview of the language context, and some of the motivations behind the documentation project. This article thus provides an entry point to the collection. The future plans for the collection – from the perspectives of both the researcher and Syuba speakers – are also outlined, but with the overwhelming majority of items in the collection available to others, it is hoped that this article will encourage use of the materials by other researchers.
Within most subfields of linguistics, the term “speaker” is often used in a shorthand, nonspecific way. In referring simply to “speakers” of endangered languages, the nuances of proficiency, language use, self-identification, and local language ideologies are collapsed into a binary: speaker vs. non-speaker. Despite the central role of local language ideologies in shaping patterns of language shift and maintenance, insiders’ perceptions of speaker status are not often investigated as part of language documentation projects. This paper approaches the issue of speaker status in Iyasa, a threatened Coastal Bantu language of Cameroon and Equatorial Guinea, through the firsthand accounts of self-identified Iyasa speakers. Using a discourse-analytic approach and the framework of identity and interaction (Bucholtz & Hall 2005), this paper examines the ways Iyasa speakers construct “speakerhood” in discourse, respond to researchers’ language ideologies, and position their own and others’ proficiency in Iyasa. Local language ideologies which equate ruralness, elderliness, and authenticity are discussed, as well as their links to similar ideologies in linguistics. Finally, the implications for language documentation and maintenance work in the Iyasa community are discussed.
The Blackfoot Language Resources and Digital Dictionary project: Creating integrated web resources for language documentation and revitalization
Inge Genee & Marie-Odile Junker, pp. 274-314
This paper describes ongoing work to create a suite of integrated web resources in support of Blackfoot language documentation, maintenance, and revitalization efforts. Built around a digital dictionary, the website also contains grammar sketches, a library of other language-related resources, and a story archive. The project began its life as advocacy research (i.e., a digital repatriation project) but developed into empowerment research through community participation. The first phase consisted of back-digitization of an existing print dictionary. The second phase, which is ongoing, works toward making the dictionary user-friendly for speakers, learners, and teachers, and embedding it in a website that contains supporting content. Key features are developed collaboratively with Blackfoot community members. In order to create an environment in which all participants are equally empowered to help shape the project, a Participatory Action Research approach was adopted for the second phase of teamwork. This resulted in important new priorities for presentation, content, and enhancement of features. It has also had impact on the participants themselves, who developed awareness and new relationships as well as acquiring new skills and knowledge, which for some contributed to new jobs and academic directions. Finally, the project is producing new material to address existing research questions and generating new questions for future research projects.
Seeing Speech: Ultrasound-based Multimedia Resources for Pronunciation Learning in Indigenous Languages
Heather Bliss, Sonya Bird, PEPAḴIYE Ashley Cooper, Strang Burton & Bryan Gick, pp. 315-338
Pronunciation is an important aspect of Indigenous language learning, and one which requires creative community-oriented solutions. Towards this end, we have developed a pronunciation learning tool that incorporates ultrasound technology to give learners a visual aid to help them articulate unfamiliar and/or challenging sounds. Ultrasound is used to create videos of a model speaker’s tongue movements during speech, which are then overlaid on videos of an external profile view of the model’s head to create ultrasound-enhanced pronunciation videos for individual words or sounds. A key advantage of these videos is that learners are able see how speech is produced rather than just hear and try to mimic it. Although ultrasound-enhanced videos were originally developed for commonly taught languages such as Japanese and French, there has been widespread interest from Indigenous communities in Western Canada to develop their own customized videos. This paper reports on three collaborations between linguists and communities in British Columbia to develop ultrasound-enhanced videos for the SENĆOŦEN, Secwepemc, and Halq’emeylem languages. These videos can give learners a new way to learn pronunciation that focuses on seeing speech, and can create new documentation of understudied sound systems for future generations.
The main aim of language documentation is to create a long-lasting multipurpose record that captures the wealth of linguistic practices of a speech community. The purpose is to reflect traditions, customs, culture, civilization, etc. This article defines, navigates and provides insights into the contents of one particular language documentation project, namely the “Documentation of the Beth Qustan Dialect of the Central Neo-Aramaic Language Turoyo”. The documentation of Turoyo was funded by the Endangered Languages Documentation Programme (ELDP), at the School of Oriental and African Studies (SOAS), University of London. All the materials collected have been archived with the Endangered Languages Archive (ELAR), at SOAS, University of London. The materials held are digital and they are freely available to all users of ELAR at https://elar.soas.ac.uk/Collection/MPI1035085.
Simultaneous Visualization of Language Endangerment and Language Description
Harald Hammarström, Thom Castermans, Robert Forkel, Kevin Verbeek, Michel A. Westenberg & Bettina Speckmann, pp. 359-392
The world harbors a diversity of some 6,500 mutually unintelligible languages. As has been increasingly observed by linguists, many minority languages are becoming endangered and will be lost forever if not documented. Urgently indeed, many efforts are being launched to document and describe languages. This undertaking naturally has the priority toward the most endangered and least described languages. For the first time, we combine world-wide databases on language description (Glottolog) and language endangerment (ElCat, Ethnologue, UNESCO) and provide two online interfaces, GlottoScope and GlottoVis, to visualize these together. The interfaces are capable of browsing, filtering, zooming, basic statistics, and different ways of combining the two measures on a world map background. GlottoVis provides advanced techniques for combining cluttered dots on a map. With the tools and databases described we seek to increase the overall knowledge of the actual state language endangerment and description worldwide.
Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit
Alexis Michaud, Oliver Adams, Trevor Anthony Cohn, Graham Neubig & Séverine Guillaume, pp. 393-429
Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.
Documenting a language with phonemic and phonetic variation: the case of Enets
Olesya Khanina, pp. 430-460
This paper describes phonemic and phonetic variation attested in Enets, a highly endangered Uralic language of Northern Siberia. This variation is worth describing for three reasons. First of all, it is a part of documenting phonology of this disappearing language. Second, it is extremely frequent and widespread, including most words of the lexicon, but at the same time it does not visibly correlate with any social parameters, so this is one more case study in the vein of the sociolinguistic agenda set by Dorian (2001; 2010). Third, the Enets variation presents a challenge for consistent transcription, let alone an orthography design. These three reasons structure the paper: after an introductory section on the Enets community, languages used in the community in past and present, methodology of this study, and phonological profile of Enets, I proceed to a phonological description of the variation (section 2), to sociolinguistic details of this variation (section 3), and finally to issues of representation of the Enets data in a vain search for a perfect orthography for the language (section 4).
Crucially, the last reason was the driving force for this research in the first place, as “[c]reating a phonemic orthography implies at least a basic phonological analysis preceding its design” (Jany 2010:234) and “faulty phonological analyses give rise to faulty orthographies” (Rehg 2004:506). Being neither a phonetician, nor a phonologist, I had initially aimed only for a basic description of sound patterns for the sake of an orthography; however, it quickly became evident that the puzzle of variation in Enets was not to be taken lightly, and more specific research was conducted. However, despite all the work done, I still see the results rather as a grounding for a consistent transcription/orthography than as a full phonological description. For the latter, Enets is still awaiting a talented phonologist, while our documentation project aimed hard to preserve exemplars of Enets sounds for this purpose (see Khanina 2017 for details).
Working with ‘Women Only’: Gendered protocols in the digitization and archiving process
Jodie Kell & Lauren Booker, pp. 461-480
Gender is a significant social category that needs to be taken into consideration when working with Australian Aboriginal communities. Whilst archives hold knowledge systems that encode cultural practices of huge importance to current Australian Indigenous language revitalization projects, women have often been marginalized and excluded due to culturally inappropriate practices of collection, storage, and access. As women working in an archive, the authors provide a gendered perspective on the development of workflow processes that have the potential to re-orientate the relationship with endangered language communities and contribute to the negotiation of agency for Aboriginal women in the archival space.
This paper draws on the experience of an Australian archiving service involved in a partnership with an Aboriginal organization to digitize resources and facilitate their return to the originating communities. As part of the partnership, tapes of women’s songs from central Australia were digitized using the skills of a female audio engineer. The paper argues that utilizing a female chain of linguist, anthropologist, musicologist, data administrator, and audio engineer in a participatory loop empowered the women in community to make choices knowing that their cultural property was being handled with respect and in a culturally appropriate manner.
Developing an Audio-visual Corpus of Scottish Gaelic
Ian Clayton, Colleen Patton, Andrew Carnie, Michael Hammond & Muriel Fisher, pp. 481-513
Scottish Gaelic, a Celtic language spoken primarily in the western regions of Scotland, is experiencing sustained contraction in its geographical extent and domains of use. Native speakers of the language are mostly over 40, and relatively few children are acquiring the language in the home. In the media, Gaelic is typically represented by a standardized form, and children learning the language through Gaelic-medium education – currently the only demographic where Gaelic is expanding – tend to acquire a standardized form of the language as well. Consequently, the rich regional diversity Gaelic once displayed has been considerably reduced in recent decades, and is likely to suffer further significant losses within the next generation. There is an imperative, therefore, to create a record of the surviving diversity within the language, focusing most urgently on remaining speakers of dialects most at risk. In this paper, we describe our ongoing efforts to develop an audio-video corpus of Gaelic which represents as diverse a range of Gaelic dialects as possible, with particular attention to those varieties most immediately at risk of loss. The corpus contains material collected over the past four years through extensive fieldwork among historically Gaelic-speaking communities in the Scottish Highlands and Islands.
Review of The Traditional Ecological Knowledge of the Solega: A Linguistic Perspective
Andrea L. Berez-Kroeker & Lucia Miller, pp. 514-522