Volume 14 (2020)
Review of LexiRumah 3.0.0
Jeroen Willemsen & Yonatan Goldshtein
Linguistics and Political Science: A Strategy for Interdisciplinary and Ethical Research Methodology on Language Endangerment and Political Conflict
Shobhana Chelliah, James Meernik, & Kimi King
We propose that linguists and political scientists develop an interdisciplinary and ethical research strategy for studying the relationships between language endangerment and political conflict. A leading cause of language endangerment is political violence driven by outside actors who expropriate land, extract resources, and displace individuals, many of whom reside in communities that speak endangered languages. Most language documentation projects, however, do not address the political landscape that causes the conflict, whether it is history, language policy, conflict over natural resources and ethno-religious identities, or absent and co-opted governmental institutions experienced by the communities in question. At the same time, political scientists have developed models to explain and predict the political conflict and violence that threaten entire communities and can also explain why indigenous communities are particularly at risk of being harmed by this type of violence. We suggest that an interdisciplinary strategy that combines some of the large N data analysis strengths of political science with the qualitative, community-driven research of linguists can best help scholars understand the determinants of language loss; conduct such research ethically, and help utilize the fruits of this research to support and empower endangered language communities.
Supporting small languages together: The history and impact of the International Conference on Language Documentation & Conservation series
Andrea L. Berez-Kroeker, Noella Handley, Bradley Rentz, Jim Yoshioka, Victoria Anderson, & Bradley McDonnell
The International Conference on Language Documentation & Conservation series, or ICLDC, has, since its inception in 2009, become the flagship conference for the field of language documentation. Every two years, conference attendees gather at the University of Hawai‘i at Mānoa to share their experiences working on diverse topics related to the preservation of underrepresented languages worldwide. Attendees come from a range of backgrounds: Indigenous language communities, language activism organizations, K–12 school systems, as well as students and faculty from colleges and universities. They represent dozens of countries and hundreds of languages, and they have one goal in mind: supporting small languages together. In this paper, we trace the history of the ICLDC series since the first iteration and discuss the scope of its impact on the field of language documentation and conservation according to conference attendees. We also look ahead to the changes that the covid-19 pandemic will bring to the structure of the conference in 2021 and beyond.
Talking about strings: The language of string figure-making in a Sepik society in Papua New Guinea
Darja Hoenigman
The practice of making string figures, often called cat’s cradle, can be found all over the world and is particularly widespread in Melanesia. It has been studied by anthropologists, linguists and mathematicians. For the latter, the ordered series of moves and the resultant string figures represent cognitive processes that form part of a practice of recreational mathematics. Modern anthropology is interested in the social and cultural aspects of string figures, including their associations with other cultural practices, with the local mythology and songs. Despite this clear link to language, few linguists have studied string figures, and those who have, have mainly focused on the songs and formulaic texts that accompany them. Based on a systematic study of string figures among the Awiakay, the inhabitants of Kanjimei village in the Sepik region of Papua New Guinea, with six hours of transcribed video recordings of the practice, this paper argues that studying string figure-making can be an important aspect of language documentation – not just through the recording and analysis of the accompanying oral literature, but also as a tool for documenting other speech genres through recordings of the naturalistic speech that surrounds string figure-making performances. In turn, analysing the language associated with string figure-making offers valuable insights into the meaning of string figures as understood by their makers.
Pre-Revitalization Language Assessment
Sejung Yang
Testing is increasingly recognized as a vital part of language revitalization. I demonstrate here that assessment of linguistic knowledge should also be part of the planning process that precedes the creation of a revitalization program. I take as an example Jejueo, the language of Korea’s Jeju Island. Whereas previously published work contradicted UNESCO’s conclusion that the language is critically endangered, a test that I designed to elicit basic vocabulary and verbal patterns from 224 participants (from elementary school students to senior citizens) revealed otherwise. Alarming deficits in basic knowledge of the language were uncovered that both confirmed UNESCO’s classification of the language and identified the particular areas in which remediation is required.
Archival description for language documentation collections
Ryan Sullivant
Users of digital language archives face a number of barriers when trying to discover and reuse the materials preserved in the digital collections created by current language documentation projects. These barriers include sparse descriptive metadata throughout many collections and the prevalence of audio-video materials that are impervious to text-based search. Users could more easily evaluate, navigate, and use such a collection if it contained a guide that contextualized it, summarized its contents, and helped users identify and locate items within it. This article will discuss the importance of thorough collection descriptions and finding aids by synthesizing guidelines and best practices for archival description created for traditional archives and adapting these to the structure and makeup of today’s digital language documentation collections. To facilitate the iterative description of growing collections, the checklist of information to include is presented in three groups of descending priority.
Keeping it real: Video data in language documentation and language archiving
Mandana Seyfeddinipur & Felix Rau
Working with video data is on its way to becoming standard practice in language documentation. However, documenters looking on the web for guidance on standards and best practices for archiving audio-visual data encounter a vast and potentially confusing diversity of information. Unfortunately, a lot of information on archiving video is concerned with digitized film stock and not with the type of video data produced in language documentation. This paper presents relevant standards and established community best practices in a short and realistic manner, pledging to keep things real.
SLEXIL: User-centred software for community language documentation
David Beck & Paul Shannon
SLEXIL (Software Linking ELAN XML to Illuminated Language) is a web application designed to allow users to create animated HTML files from time-aligned transcriptions made in ELAN. Unlike earlier projects with similar goals, SLEXIL is a zero-installation web app developed strictly on user-centred principles, designed with the goal of transferring as much of the technical expertise needed for the process away from the user and onto the maintainers and developers of the software. While SLEXIL itself is rather modest and built for a very specific purpose, we feel that its design is proof of concept for the next generation of user-centred software applications developed for linguists, community language activists, teachers, and others involved in Indigenous and Minority Language Sustainability.
A collaborative development of workshops for teachers of Great Basin languages using principles of decolonization and language reclamation
Ignacio L. Montoya, Debra Harry, & Jennie Burns
The project described in this paper adopts a decolonization-oriented, reclamation-based approach to language maintenance and revitalization. Designed and implemented collaboratively with members of the local university and tribal communities, the project involves a series of five two-hour professional development workshops for teachers of Great Basin Indigenous languages spoken in and around Northern Nevada: Numu (Northern Paiute), Wašiw (Washo), and Newe (Western Shoshone). The primary goal of the project was building capacity to support language teachers by facilitating presentations, discussions, and activities that contribute to the sharing of ideas and best practices for the promotion of local languages. These workshops were preceded by an information-gathering session to determine the interests and needs of language teachers, which resulted in the selection of workshop topics: decolonization, teaching techniques, linguistics, Great Basin history and culture, and media/recording. A diverse set of facilitators and participants were involved with the project, most of whom were members of local tribal communities. Throughout the project, the organizers remained mindfully focused on the notions of decolonization, capacity-building, and respect for Indigenous knowledge.
Determinants of phonetic word duration in ten language documentation corpora: Word frequency, complexity, position, and part of speech
Jan Strunk, Frank Seifart, Swintha Danielsen, Iren Hartmann, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich, & Balthasar Bickel
This paper explores the application of quantitative methods to study the effect of various factors on phonetic word duration in ten languages. Data on most of these languages were collected in fieldwork aiming at documenting spontaneous speech in mostly endangered languages, to be used for multiple purposes, including the preservation of cultural heritage and community work. Here we show the feasibility of studying processes of online acceleration and deceleration of speech across languages using such data, which have not been considered for this purpose before. Our results show that it is possible to detect a consistent effect of higher frequency of words leading to faster articulation even in the relatively small language documentation corpora used here. We also show that nouns tend to be pronounced more slowly than verbs when controlling for other factors. Comparison of the effects of these and other factors shows that some of them are difficult to capture with the current data and methods, including potential effects of cross-linguistic differences in morphological complexity. In general, this paper argues for widening the cross-linguistic scope of phonetic and psycholinguistic research by including the wealth of language documentation data that has recently become available.
Finding Hawu: Legacy data, finding aids and the Alan T. Walker Digital Language Collection
Anthony R. Vaughan
Digital language data provide accessible and enduring records for world languages. While legacy data collections may offer new insights into small or endangered languages, their digitization can raise practical challenges in terms of navigating vast databases of files with limited metadata. This paper demonstrates the practical benefits of creating a finding aid and inventory for a large collection of legacy data which has been converted to digital format. In so doing, it also provides a guide to the Alan T. Walker Collection for Lii Hawu, ‘the Hawu language’ of Eastern Indonesian (also known variously as Havu, Sawu, Savu, and Sabu language). A guide to the Walker Collection was needed in order to more easily navigate its digital contents in PARADISEC (Pacific and Regional Archive for Digital Sources in Endangered Cultures). The Collection includes approximately 13 hours of digitized audio-cassette recordings and 7,425 digitized images from 43 scanned handwritten notebooks. The paper concludes with a brief consideration of the process of working with digitized legacy data and the benefits derived from creating a finding aid and inventory for such data.
Contrasting statistical indicators of Māori language revitalization: Conversational ability, speaking proficiency, and first language
Chris Lane
Is it possible to track the revitalization of the Māori language statistically? Different large-scale statistical collections (censuses and surveys) in New Zealand effectively have different definitions of speaker because they ask different questions. This paper compares trends in numbers of Māori speakers as estimated from responses to questions about conversational ability, first language, and level of speaking proficiency, with particular reference to the 2013 Census and Te Kupenga (Māori social survey) 2013. One might expect estimates based on these responses to align closely, but they do not. This paper explores the relationships between the different estimates for different birth cohorts. Data on first language from at least four surveys provide strong evidence of a resurgence in intergenerational language transmission, which is not clearly apparent from the other indicators. Patterns of response to conversational ability and speaking proficiency questions are found to vary according to first language and birth cohort. It is argued that the apparent inconsistencies between the indicators reflect the real complexity of revitalization processes, as well as varying interpretations of the language questions, and that the New Zealand census language question on conversational ability is of questionable value as an indicator for tracking Māori language revitalization.
What is “natural” speech? Comparing free narratives and Frog stories in Indonesia
Marian Klamer & Francesca R. Moro
While there is overall consensus that narratives obtained by means of visual stimuli contain less natural language than free narratives, it has been less clear how the naturalness of a narrative can be measured in a crosslinguistically meaningful way. Here this question is addressed by studying the differences between free narratives and narratives elicited using the Frog story in two languages of eastern Indonesia, Alorese (Austronesian) and Teiwa (Papuan). Both these languages are not commonly written, and belong to families that are typologically distinct. We compare eight speakers telling free narratives and Frog stories, investigating the lexical density (noun-pronoun ratio, noun-clause ratio, noun-verb ratio), narrative style (the use of direct speech reports and tail-head linkage), as well as speech rate. We find significant differences between free and prompted narratives along these three dimensions, and suggest that they can be used to measure the naturalness of speech in oral narratives more generally.
Documentation of Lakurumau: Making the case for one more language in Papua New Guinea
Lidia Federica Mazzitelli
This paper provides an introduction to Lakurumau, a previously undescribed and undocumented Oceanic language of Papua New Guinea. The first part of the paper is a guide to the Lakurumau documentation corpus, deposited in the ELAR archive. The participants and the content of the deposit, the technology used for recording, and the ethical protocols followed in the construction of the corpus are discussed. In the second part, a brief grammatical description of Lakurumau is presented, providing morpho-syntactic and sociolinguistic evidence in support of the classification of Lakurumau as an independent language, and some directions for future work are outlined.
Child-directed language – and how it informs the documentation and description of the adult language
Birgit Hellwig & Dagmar Jung
Language documentation efforts are most often concerned with the adult language and usually do not include the language used by and with children. Essential parts of the natural linguistic behaviour of communities thus remain undocumented, and a growing body of literature explores what language documentation, language maintenance, and language revitalization have to gain by including child language and child-directed language.
This paper adds a methodological perspective to the discussion, arguing that child language and child-directed language constitute data types that can inform our understanding of the adult language. For reasons of feasibility, the paper focuses on child-directed language only. Presenting data from two on-going language acquisition projects (Qaqet from Papua New Guinea and Dëne Sųłıné from Canada), we illustrate how this data type provides insights into the metalinguistic knowledge of adult speakers. After an introduction to child-directed language, three case studies on the topics of variation sets, clarification processes, and discourse context are exemplified from both languages and related to our understanding of the adult language. Focusing on the potential of this data type, this paper argues in favour of extending our documentation efforts to events involving children.
A method comparison analysis examining the relationship between linguistic tone, melodic tune, and sung performances of children’s songs in Chicahuaxtla Triqui: Findings and implications for documentary linguistics and indigenous language communities
A. Raymond Elliott
Linguistic tones play an important role in expressing lexical and grammatical meaning in tone languages. A small change in the pitch of a word can result in an entirely different meaning. A logical question for those who document tone languages is whether or not singers preserve linguistic tone when singing and if so, to what degree? I begin by providing an overview of research in documentary linguistics that examines the interrelationship between linguistic tone and melody in tone languages. While the majority of articles have focused on Asian and African languages, there is only one investigation by Pike (1939) that examined the relationship between tone and tune in an unspecified variety of Mixtec, an Otomanguean language. In order to further our understanding of the tone-tune relationship, especially with regard to Otomanguean languages, I use three separate procedures for looking at the interrelationship between tone and tune in spoken, sung, and played performances of two popular children’s songs in Chicahuaxtla Triqui. While the first experiment yielded a non-significant relationship between linguistic tone and note transitions in the musical scores, the second and third experiments showed that the pitch traces of the spoken and played performances of the songs both relate to and perhaps influence pitch transitions and pitch transition differentials in the sung performances. The overall finding is that the song melody appears to exert a greater influence on the pitch tracings of the sung performances than does linguistic tone as measured in the spoken performances of the songs. With regard to experimental studies examining tone and tune, this study suggests that a set of well-defined prosodic features, such as overall pitch range, average F_0, F_0 for individual tones, and the difference between adjacent tones as measured in Hz, need to be considered when comparing tone to melodic tune. Simply correlating the correspondence or directionality of linguistic tones to that of the note transitions in musical scores does not appear to be promising nor sensitive enough to reveal the true interrelationship between tone and tune. This article ends with a discussion of the benefits of documenting songs in tone languages for linguists in addition to the advantages of teaching music to children of indigenous language communities.
Quantifying written ambiguities in tone languages: A comparative study of Elip, Mbelime, and Eastern Dan
David Roberts, Ginger Boyd, Johannes Merz, & Valentin Vydrin
Whether tone should be represented in writing, and if so how much, is one of the most formidable challenges facing those developing orthographies for tone languages. Various researchers have attempted to quantify the level of written ambiguity in a language if tone is not marked, but these contributions are not easily comparable because they use different measurement criteria. This article presents a first attempt to develop a standardized instrument and evaluate its potential. The method is exemplified using four narrative texts translated into Elip, Mbelime, and Eastern Dan. It lists all distinct written word forms that are homographs if tone is not marked, discarding repeated words, homophony, and polysemy, as well as pairs that never share the same syntactic slot. It treats lexical and grammatical tone separately, while acknowledging that these two functions often coincide. The results show that the level of written ambiguity in Elip is weighted towards the grammar, while in Mbelime many ambiguities occur at the point where lexical and grammatical tone coincide. As for Eastern Dan, with its profusion of nominal and verbal minimal pairs, not to mention pronouns, case markers, predicative markers, and other parts of speech, the level of written ambiguity if tone is not marked is by far the highest of the three languages. The article ends with some suggestions of how the methodology might be refined, by reporting some experimental data that provide only limited proof of the need to mark tone fully, and by describing how full tone marking has survived recent spelling reforms in all three languages.
LingView: A Web Interface for Viewing FLEx and ELAN Files
Kalinda Pride, Nicholas Tomlin, & Scott AnderBois
This article presents LingView (https://github.com/BrownCLPS/LingView), a web interface for viewing FLEx and ELAN files, optionally time-synced with corresponding audio or video files. While FLEx and ELAN are useful tools for many linguists, the resulting annotated files are often inaccessible to the general public. Here, we describe a data pipeline for combining FLEx and ELAN files into a single JSON format which can be displayed on the web. While this software was originally built as part of the A’ingae Language Documentation Project to display a corpus of materials in A’ingae, the software was designed to be a flexible resource for a variety of different communities, researchers, and materials.
Multidirectional leveraging for computational morphology and language documentation and revitalization
Sylvia L. R. Schreiner, Lane Schwartz, Benjamin Hunt, & Emily Chen
St. Lawrence Island Yupik is an endangered language of the Bering Strait region. In this paper, we describe our work on Yupik jointly leveraging computational morphology and linguistic fieldwork, outlining the multilayer virtuous cycle that we continue to refine in our work to document and build tools for the language. After developing a preliminary morphological analyzer from an existing pedagogical grammar of Yupik, we used it to help analyze new word forms gathered through fieldwork. While in the field, we augmented the analyzer to include insights into the lexicon, phonology, and morphology of the language as they were gained during elicitation sessions and subsequent data analysis. The analyzer and other tools we have developed are improved by a corpus that continues to grow through our digitization and documentation efforts, and the computational tools in turn allow us to improve and speed those same efforts. Through this process, we have successfully identified previously undescribed lexical, morphological, and phonological processes in Yupik while simultaneously increasing the coverage of the morphological analyzer. Given the polysynthetic nature of Yupik, a high-coverage morphological analyzer is a necessary prerequisite for the development of other high-level computational tools that have been requested by the Yupik community.
Review of Activating the heart: Storytelling, knowledge sharing and relationship
Alexis Michaud
Notes from the Field: Inagta Alabat: A moribund Philippine language, with supporting audio
Jason William Lobel, Amy Jugueta Alpay, Rosie Susutin Barreno, & Emelinda Jugueta Barreno
Arguably the most critically-endangered language in the Philippines, Inagta Al- abat (also known as Inagta Lopez and Inagta Villa Espina) is spoken by fewer than ten members of the small Agta community on the island of Alabat off the northern coast of Quezon Province on the large northern Philippine island of Lu- zon, and by an even smaller number of Agta further east in the province. This short sketch provides some brief sociolinguistic notes on the group, followed by an overview of its phoneme system, grammatical subsystems, and verb system. Over 800 audio recordings accompany the article, including 100 sentences, three short narratives, and a list of over 200 basic vocabulary items.
Nearly half a century has passed since Philippine educator Teodoro Llamzon discovered the Remontado language, which would be introduced to the world in a master’s thesis written by his student Pilar Santos. Although data from the wordlists they collected have been included in subsequent publications by several other authors, no one had revisited the language community, let alone collected any additional data on this highly-endangered language, prior to the current authors. This article presents updated information on the language community, the current state of the language, and a revised description of the various grammatical subsystems of the language, including its verbal morphology. Also included are over 400 audio recordings illustrating basic aspects of the phonology as well as the various functor sets and verb forms, and a short text for comparison with other similar language sketches.