Note that the embedded audio is best played in Adobe Reader.
Notes from the Field: Inagta Alabat: A moribund Philippine language, with supporting audio
Jason William Lobel, Amy Jugueta Alpay, Rosie Susutin Barreno, & Emelinda Jugueta Barreno, pp. 1-57
Arguably the most critically-endangered language in the Philippines, Inagta Al- abat (also known as Inagta Lopez and Inagta Villa Espina) is spoken by fewer than ten members of the small Agta community on the island of Alabat off the northern coast of Quezon Province on the large northern Philippine island of Lu- zon, and by an even smaller number of Agta further east in the province. This short sketch provides some brief sociolinguistic notes on the group, followed by an overview of its phoneme system, grammatical subsystems, and verb system. Over 800 audio recordings accompany the article, including 100 sentences, three short narratives, and a list of over 200 basic vocabulary items.
Review of Activating the heart: Storytelling, knowledge sharing and relationship
Alexis Michaud, pp. 58-68
Multidirectional leveraging for computational morphology and language documentation and revitalization
Sylvia L. R. Schreiner, Lane Schwartz, Benjamin Hunt, & Emily Chen, pp. 69-86
St. Lawrence Island Yupik is an endangered language of the Bering Strait region. In this paper, we describe our work on Yupik jointly leveraging computational morphology and linguistic fieldwork, outlining the multilayer virtuous cycle that we continue to refine in our work to document and build tools for the language. After developing a preliminary morphological analyzer from an existing pedagogical grammar of Yupik, we used it to help analyze new word forms gathered through fieldwork. While in the field, we augmented the analyzer to include insights into the lexicon, phonology, and morphology of the language as they were gained during elicitation sessions and subsequent data analysis. The analyzer and other tools we have developed are improved by a corpus that continues to grow through our digitization and documentation efforts, and the computational tools in turn allow us to improve and speed those same efforts. Through this process, we have successfully identified previously undescribed lexical, morphological, and phonological processes in Yupik while simultaneously increasing the coverage of the morphological analyzer. Given the polysynthetic nature of Yupik, a high-coverage morphological analyzer is a necessary prerequisite for the development of other high-level computational tools that have been requested by the Yupik community.
LingView: A Web Interface for Viewing FLEx and ELAN Files
Kalinda Pride, Nicholas Tomlin, & Scott AnderBois, pp. 87-107
This article presents LingView (https://github.com/BrownCLPS/LingView), a web interface for viewing FLEx and ELAN files, optionally time-synced with corresponding audio or video files. While FLEx and ELAN are useful tools for many linguists, the resulting annotated files are often inaccessible to the general public. Here, we describe a data pipeline for combining FLEx and ELAN files into a single JSON format which can be displayed on the web. While this software was originally built as part of the A’ingae Language Documentation Project to display a corpus of materials in A’ingae, the software was designed to be a flexible resource for a variety of different communities, researchers, and materials.
Quantifying written ambiguities in tone languages: A comparative study of Elip, Mbelime, and Eastern Dan
David Roberts, Ginger Boyd, Johannes Merz, & Valentin Vydrin, pp. 108-138
Whether tone should be represented in writing, and if so how much, is one of the most formidable challenges facing those developing orthographies for tone languages. Various researchers have attempted to quantify the level of written ambiguity in a language if tone is not marked, but these contributions are not easily comparable because they use different measurement criteria. This article presents a first attempt to develop a standardized instrument and evaluate its potential. The method is exemplified using four narrative texts translated into Elip, Mbelime, and Eastern Dan. It lists all distinct written word forms that are homographs if tone is not marked, discarding repeated words, homophony, and polysemy, as well as pairs that never share the same syntactic slot. It treats lexical and grammatical tone separately, while acknowledging that these two functions often coincide. The results show that the level of written ambiguity in Elip is weighted towards the grammar, while in Mbelime many ambiguities occur at the point where lexical and grammatical tone coincide. As for Eastern Dan, with its profusion of nominal and verbal minimal pairs, not to mention pronouns, case markers, predicative markers, and other parts of speech, the level of written ambiguity if tone is not marked is by far the highest of the three languages. The article ends with some suggestions of how the methodology might be refined, by reporting some experimental data that provide only limited proof of the need to mark tone fully, and by describing how full tone marking has survived recent spelling reforms in all three languages.
A method comparison analysis examining the relationship between linguistic tone, melodic tune, and sung performances of children’s songs in Chicahuaxtla Triqui: Findings and implications for documentary linguistics and indigenous language communities
A. Raymond Elliott, pp. 139-187
Linguistic tones play an important role in expressing lexical and grammatical meaning in tone languages. A small change in the pitch of a word can result in an entirely different meaning. A logical question for those who document tone languages is whether or not singers preserve linguistic tone when singing and if so, to what degree? I begin by providing an overview of research in documentary linguistics that examines the interrelationship between linguistic tone and melody in tone languages. While the majority of articles have focused on Asian and African languages, there is only one investigation by Pike (1939) that examined the relationship between tone and tune in an unspecified variety of Mixtec, an Otomanguean language. In order to further our understanding of the tone-tune relationship, especially with regard to Otomanguean languages, I use three separate procedures for looking at the interrelationship between tone and tune in spoken, sung, and played performances of two popular children’s songs in Chicahuaxtla Triqui. While the first experiment yielded a non-significant relationship between linguistic tone and note transitions in the musical scores, the second and third experiments showed that the pitch traces of the spoken and played performances of the songs both relate to and perhaps influence pitch transitions and pitch transition differentials in the sung performances. The overall finding is that the song melody appears to exert a greater influence on the pitch tracings of the sung performances than does linguistic tone as measured in the spoken performances of the songs. With regard to experimental studies examining tone and tune, this study suggests that a set of well-defined prosodic features, such as overall pitch range, average F_0, F_0 for individual tones, and the difference between adjacent tones as measured in Hz, need to be considered when comparing tone to melodic tune. Simply correlating the correspondence or directionality of linguistic tones to that of the note transitions in musical scores does not appear to be promising nor sensitive enough to reveal the true interrelationship between tone and tune. This article ends with a discussion of the benefits of documenting songs in tone languages for linguists in addition to the advantages of teaching music to children of indigenous language communities.
Child-directed language – and how it informs the documentation and description of the adult language
Birgit Hellwig & Dagmar Jung, pp. 188-214
Language documentation efforts are most often concerned with the adult language and usually do not include the language used by and with children. Essential parts of the natural linguistic behaviour of communities thus remain undocumented, and a growing body of literature explores what language documentation, language maintenance, and language revitalization have to gain by including child language and child-directed language.
This paper adds a methodological perspective to the discussion, arguing that child language and child-directed language constitute data types that can inform our understanding of the adult language. For reasons of feasibility, the paper focuses on child-directed language only. Presenting data from two on-going language acquisition projects (Qaqet from Papua New Guinea and Dëne Sųłıné from Canada), we illustrate how this data type provides insights into the metalinguistic knowledge of adult speakers. After an introduction to child-directed language, three case studies on the topics of variation sets, clarification processes, and discourse context are exemplified from both languages and related to our understanding of the adult language. Focusing on the potential of this data type, this paper argues in favour of extending our documentation efforts to events involving children.
Documentation of Lakurumau: Making the case for one more language in Papua New Guinea
Lidia Federica Mazzitelli, pp. 215-237
This paper provides an introduction to Lakurumau, a previously undescribed and undocumented Oceanic language of Papua New Guinea. The first part of the paper is a guide to the Lakurumau documentation corpus, deposited in the ELAR archive. The participants and the content of the deposit, the technology used for recording, and the ethical protocols followed in the construction of the corpus are discussed. In the second part, a brief grammatical description of Lakurumau is presented, providing morpho-syntactic and sociolinguistic evidence in support of the classification of Lakurumau as an independent language, and some directions for future work are outlined.
What is “natural” speech? Comparing free narratives and Frog stories in Indonesia
Marian Klamer & Francesca R. Moro, pp. 238-313
While there is overall consensus that narratives obtained by means of visual stimuli contain less natural language than free narratives, it has been less clear how the naturalness of a narrative can be measured in a crosslinguistically meaningful way. Here this question is addressed by studying the differences between free narratives and narratives elicited using the Frog story in two languages of eastern Indonesia, Alorese (Austronesian) and Teiwa (Papuan). Both these languages are not commonly written, and belong to families that are typologically distinct. We compare eight speakers telling free narratives and Frog stories, investigating the lexical density (noun-pronoun ratio, noun-clause ratio, noun-verb ratio), narrative style (the use of direct speech reports and tail-head linkage), as well as speech rate. We find significant differences between free and prompted narratives along these three dimensions, and suggest that they can be used to measure the naturalness of speech in oral narratives more generally.
Is it possible to track the revitalization of the Māori language statistically? Different large-scale statistical collections (censuses and surveys) in New Zealand effectively have different definitions of speaker because they ask different questions. This paper compares trends in numbers of Māori speakers as estimated from responses to questions about conversational ability, first language, and level of speaking proficiency, with particular reference to the 2013 Census and Te Kupenga (Māori social survey) 2013. One might expect estimates based on these responses to align closely, but they do not. This paper explores the relationships between the different estimates for different birth cohorts. Data on first language from at least four surveys provide strong evidence of a resurgence in intergenerational language transmission, which is not clearly apparent from the other indicators. Patterns of response to conversational ability and speaking proficiency questions are found to vary according to first language and birth cohort. It is argued that the apparent inconsistencies between the indicators reflect the real complexity of revitalization processes, as well as varying interpretations of the language questions, and that the New Zealand census language question on conversational ability is of questionable value as an indicator for tracking Māori language revitalization.
Finding Hawu: Legacy data, finding aids and the Alan T. Walker Digital Language Collection
Anthony R. Vaughan, pp. 357-422
Digital language data provide accessible and enduring records for world languages. While legacy data collections may offer new insights into small or endangered languages, their digitization can raise practical challenges in terms of navigating vast databases of files with limited metadata. This paper demonstrates the practical benefits of creating a finding aid and inventory for a large collection of legacy data which has been converted to digital format. In so doing, it also provides a guide to the Alan T. Walker Collection for Lii Hawu, ‘the Hawu language’ of Eastern Indonesian (also known variously as Havu, Sawu, Savu, and Sabu language). A guide to the Walker Collection was needed in order to more easily navigate its digital contents in PARADISEC (Pacific and Regional Archive for Digital Sources in Endangered Cultures). The Collection includes approximately 13 hours of digitized audio-cassette recordings and 7,425 digitized images from 43 scanned handwritten notebooks. The paper concludes with a brief consideration of the process of working with digitized legacy data and the benefits derived from creating a finding aid and inventory for such data.
Determinants of phonetic word duration in ten language documentation corpora: Word frequency, complexity, position, and part of speech
Jan Strunk, Frank Seifart, Swintha Danielsen, Iren Hartmann, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich, & Balthasar Bickel, pp. 423-461
This paper explores the application of quantitative methods to study the effect of various factors on phonetic word duration in ten languages. Data on most of these languages were collected in fieldwork aiming at documenting spontaneous speech in mostly endangered languages, to be used for multiple purposes, including the preservation of cultural heritage and community work. Here we show the feasibility of studying processes of online acceleration and deceleration of speech across languages using such data, which have not been considered for this purpose before. Our results show that it is possible to detect a consistent effect of higher frequency of words leading to faster articulation even in the relatively small language documentation corpora used here. We also show that nouns tend to be pronounced more slowly than verbs when controlling for other factors. Comparison of the effects of these and other factors shows that some of them are difficult to capture with the current data and methods, including potential effects of cross-linguistic differences in morphological complexity. In general, this paper argues for widening the cross-linguistic scope of phonetic and psycholinguistic research by including the wealth of language documentation data that has recently become available.
A collaborative development of workshops for teachers of Great Basin languages using principles of decolonization and language reclamation
Ignacio L. Montoya, Debra Harry, & Jennie Burns, pp. 462-487
The project described in this paper adopts a decolonization-oriented, reclamation-based approach to language maintenance and revitalization. Designed and implemented collaboratively with members of the local university and tribal communities, the project involves a series of five two-hour professional development workshops for teachers of Great Basin Indigenous languages spoken in and around Northern Nevada: Numu (Northern Paiute), Wašiw (Washo), and Newe (Western Shoshone). The primary goal of the project was building capacity to support language teachers by facilitating presentations, discussions, and activities that contribute to the sharing of ideas and best practices for the promotion of local languages. These workshops were preceded by an information-gathering session to determine the interests and needs of language teachers, which resulted in the selection of workshop topics: decolonization, teaching techniques, linguistics, Great Basin history and culture, and media/recording. A diverse set of facilitators and participants were involved with the project, most of whom were members of local tribal communities. Throughout the project, the organizers remained mindfully focused on the notions of decolonization, capacity-building, and respect for Indigenous knowledge.
SLEXIL: User-centred software for community language documentation
David Beck & Paul Shannon, pp. 488-502
SLEXIL (Software Linking ELAN XML to Illuminated Language) is a web application designed to allow users to create animated HTML files from time-aligned transcriptions made in ELAN. Unlike earlier projects with similar goals, SLEXIL is a zero-installation web app developed strictly on user-centred principles, designed with the goal of transferring as much of the technical expertise needed for the process away from the user and onto the maintainers and developers of the software. While SLEXIL itself is rather modest and built for a very specific purpose, we feel that its design is proof of concept for the next generation of user-centred software applications developed for linguists, community language activists, teachers, and others involved in Indigenous and Minority Language Sustainability.
Keeping it real: Video data in language documentation and language archiving
Mandana Seyfeddinipur & Felix Rau, pp. 503-519
Working with video data is on its way to becoming standard practice in language documentation. However, documenters looking on the web for guidance on standards and best practices for archiving audio-visual data encounter a vast and potentially confusing diversity of information. Unfortunately, a lot of information on archiving video is concerned with digitized film stock and not with the type of video data produced in language documentation. This paper presents relevant standards and established community best practices in a short and realistic manner, pledging to keep things real.
Archival description for language documentation collections
Ryan Sullivant, pp. 520-578
Users of digital language archives face a number of barriers when trying to discover and reuse the materials preserved in the digital collections created by current language documentation projects. These barriers include sparse descriptive metadata throughout many collections and the prevalence of audio-video materials that are impervious to text-based search. Users could more easily evaluate, navigate, and use such a collection if it contained a guide that contextualized it, summarized its contents, and helped users identify and locate items within it. This article will discuss the importance of thorough collection descriptions and finding aids by synthesizing guidelines and best practices for archival description created for traditional archives and adapting these to the structure and makeup of today’s digital language documentation collections. To facilitate the iterative description of growing collections, the checklist of information to include is presented in three groups of descending priority.