Special Publications

LD&C publishes occasional Special Publications, usually on specific themes. The contents of each of these is given via RSS feed below.

SP13: Documenting Variation in Endangered Languages

  He nui nā ala e hiki aku ai: Factors influencing phonetic variation in the Hawaiian word kēia
    Abstract: Apart from a handful of studies (e.g., Kinney 1956), linguists know little about what variation exists in Hawaiian and what factors constrain the variation. In this paper, we present an analysis of phonetic variation in the word kia, meaning ‘this’, examining the social, linguistic, and probabilistic factors that constrain the variation. The word kia can be pronounced with a constricted glottis (e.g., as creak or a glottal stop) or without one (Pukui & Elbert 1986: 142) and, like many words in Hawaiian, it can undergo phonetic reduction. The analysis was conducted on interviews with eight native-speaking kpuna (elders) who were recorded in the 1970s. We find that the likelihood of the word being realized with a constricted glottis decreases if the word immediately following kia begins with an oral stop or if the speaker is a man. Additionally, we observe a higher likelihood of phonetic reduction as word sequences (kia + the following word(s)) are repeated during the interaction. The results contribute to current models of speech production and planning, and they inform work aimed at supporting the ongoing efforts to conserve and revitalize the Hawaiian language.
  Perspectives on linguistic documentation from sociolinguistic research on dialects
    Abstract: The goal of the paper is to demonstrate how sociolinguistic research can be applied to endangered language documentation field linguistics. It first provides an overview of the techniques and practices of sociolinguistic fieldwork and the ensuring corpus compilation methods. The discussion is framed with examples from research projects focused on European-heritage English-speaking communities in the UK and Canada that have documented and analyzed English dialects from the far reaches of Scotland to the wilds of Northern Ontario, Canada. The main focus lies on morpho-syntactic and discourse-pragmatic variation; however, the same techniques could be applied to other types of variation. The discussion includes examples from a broad range of research studies in order to illustrate how sociolinguistic analyses are conducted and what they offer for understanding language variation and change.
  Areal analysis of language attitudes and practices: A case study from Nepal
    Abstract: This paper has two aims. One aim is to consider non-structural (language attitude and use) variables as valid in the field of dialect and linguistic geography in an inner Himalayan valley of Nepal, where four languages have traditionally co-existed asymmetrically and which demonstrate different degrees of vitality vs. endangerment. The other aim is an application of modified spatiality as it aligns with speaker attitudes and practices amidst recent and ongoing socio-economic and population changes. We demonstrate that variation in self-reported attitudes and practices across languages in this region can be explained as much with adjusted spatial factors (labeled ‘social space’) as with traditional social factors (e.g. gender, age, formal education, occupation, etc.). As such, our study contributes to a discourse on the role and potential of spatiality in sociolinguistic analyses of smaller language communities.
  Three speakers, four dialects: Documenting variation in an endangered Amazonian language
    Abstract: This paper offers a case study on dialect contact in Maih ki (Tukanoan, Peru), with the goal of illustrating how documentation of variation can contribute to a general language documentation project. I begin by describing the facts of variation in one dialectally diverse Maih ki-speaking community. I then argue that the outcomes of dialect mixing in this speech community can be understood only through a fine-grained analysis centering the dialectal composition of the communities of practice to which speakers belonged in early life. The coarse-grained identity categories used in most variationist analyses, such as age and gender, are less informative. After proposing a network theory interpretation of this finding, I discuss its implications for the role of (a) ethnography and (b) the European dialect mixing literature in research on variation in endangered languages. Second, I describe some surprising similarities between this speech community and those described in classic variationist literature. Like urban English speakers, Maih ki speakers attach less indexical value to morphosyntactic than to phonological variation, and – although their language lacks a standard – engage in indexically motivated style-shifting. I discuss ways to adapt variationist methods to endangered language settings to capture these phenomena, then close with comments on the importance of documenting variation for conservation.
  Language shift and linguistic insecurity
    Abstract: Variation in language is constant and inevitable. In a vital speech community some variation disappears as speakers age, and some results in long-term change, but all change will be preceded by a period of variation. Speakers of endangered languages may perceive variation in an especially negative light when it is thought to be due to contact with the dominant language. This contributes to negative evaluations of young people’s speech by older speakers, and in turn contributes to the linguistic insecurity of young speakers, which may result in even further shift toward the dominant language. In this paper we discuss language variation in the context of shift with respect to the notion of linguistic insecurity and what we identify as three distinct types of linguistic insecurity, particularly in cases of indigenous language loss in the Americas. We conclude with some observations on the positive results of directly addressing linguistic insecurity in language maintenance/revitalization programs.
  Documenting variation in (endangered) heritage languages: how and why?
    Abstract: This paper contributes to recently expanded interest in documenting variable as well as categorical patterns of endangered languages. It describes approaches, tools and curricular developments that have benefitted from involving students who are heritage language community members, key to expanding variationist focus to a wider range of languages. I describe aspects of the Heritage Language Variation and Change Project in Toronto, contrasting a “truly” endangered language to a less clearly endangered language. Faetar, with
  Documenting sociolinguistic variation in lesser-studied indigenous communities: Challenges and practical solutions
    Abstract: Documenting sociolinguistic variation in lesser-studied languages presents methodological challenges, but also offers important research opportunities. In this paper we examine three key methodological challenges commonly faced by researchers who are outsiders to the community. We then present practical solutions for successful variationist research on indigenous languages and meaningful partnerships with local communities. In particular, we draw insights from our research with Australian languages and indigenous languages of rural China. We also highlight reasons why such lesser-studied languages are crucial to the further advancement of sociolinguistic theory, arguing that the value of the research justifies the effort needed to overcome the methodological difficulty. We find that the challenges of sociolinguistics in these communities sometimes make standard variationist methods untenable, but the methodological solutions we propose can lead to valuable results and community relationships.
  Mutsun-English English-Mutsun Dictionary
    Abstract: Mutsun is a Costanoan language (part of the Utian language family) from California in the area around the modern towns of San Juan Bautista, Hollister, and Gilroy. The last fluent speaker of Mutsun, Mrs. Ascension Solarsano, died in 1930. Because of her work and the work of earlier native Mutsun speakers with early linguists, there is a large written corpus of Mutsun. This dictionary was compiled by analyzing that documentation. The dictionary is written to be useful both for language revitalization and for linguistic research.

SP10: African language documentation: new data, methods and approaches

  Multilingualism, affiliation and spiritual insecurity. From phenomena to process in language documentation
    Abstract: Documentary linguists have often been urged to integrate language ideologies and other topics more closely to ethnography than to linguistics in their research, but these recommendations have seldom coincided, in literature, with practical directions for their implementation. This paper aims to contribute to filling this gap. After re-considering current documentary approaches, a case study from a documentation project in NW Cameroon is presented to show how an ethnographically-informed sociolinguistic survey on multilingualism can lead to progressively deeper insights into the local language ideology. The methodological implications that this research perspective brings to both documentary linguistics and language support and revitalization projects are discussed. A number of practical suggestions are finally proposed, illustrating the importance of language documentation projects being carried out by multidisciplinary teams.
  Linguistic variation and the dynamics of language documentation: Editing in 'pure' Kagulu
    Abstract: The Tanzanian ethnic community language Kagulu is in extended language contact with the national language Swahili and other neighbouring community languages. The effects of contact are seen in vocabulary and structure, leading to a high degree of linguistic variation and to the development of distinct varieties of ‘pure’ and ‘mixed’ Kagulu. A comprehensive documentation of the language needs to take this variation into account and to provide a description of the different varieties and their interaction. The paper illustrates this point by charting the development of a specific text within a language documentation project. A comparison of three versions of the text – a recorded oral story, a transcribed version of it and a further, edited version in which features of pure Kagulu are edited in – shows the dynamics of how the different versions of the text interact and provides a detailed picture of linguistic variation and of speakers’ use and exploitation of it. We show that all versions of the text are valid, ‘authentic’ representations of their own linguistic reality, and how all three of them, and the processes of their genesis, are an integral part of a comprehensive documentation of Kagulu and its linguistic ecology.
  Language documentation in Africa: turning tables
  Pure fiction – the interplay of indexical and essentialist language ideologies and heterogeneous practices. A view from Agnack
    Abstract: This paper investigates the complex interplay between different sets of language ideologies and multilingual practice in a village in Lower Casamance (Senegal). In this heterogeneous linguistic environment, which is typical of many African settings, individuals have large and adaptive linguistic repertoires. The local language ideologies focus on different aspects of identity which languages serve to index, but enable individuals to focus on different facets of identity according to context. National language ideologies are essentialist and have as their goal to put constructed homogeneous communities on the polyglossic map of Senegalese languages. In contrast to similarly essential Western ideologies, however, these national ideologies operating in Senegal are not linked to actual standard language practices. Using the example of individuals in two households and by presenting rich ethnographic information on them, the paper explores the relationship between language use and language ideologies before describing a sampling method for documenting language use in these contexts. It is argued that the documentation of these contexts cannot be achieved independently of an understanding of the language ideologies at work, as they influence what is presented as linguistic practice, and that arriving at a holistic description and documentation of the multilingual settings of Africa and beyond is central for advancing linguistic theory in sociolinguistics, psycholinguistics and contact linguistics.
  Why are they named after death? Name giving, name changing and death prevention names in Gújjolaay Eegimaa (Banjal)
    Abstract: This paper advocates the integration of ethnographic information such as anthroponymy in language documentation, by discussing the results of the documentation of personal names among speakers of Gújjolaay Eegimaa. Our study shows that Eegimaa proper names include names that may be termed ‘meaningless names’, because their meanings are virtually impossible to identify, and meaningful names, i.e. names whose meanings are semantically transparent. Two main types of meaningful proper names are identified: those that describe aspects of an individual’s physic or character, and ritual names which are termed death prevention names. Death prevention names include names given to women who undergo the Gaññalen ‘birth ritual’ to help them with pregnancy and birthgiving, and those given to children to fight infant mortality. We provide an analysis of the morphological structures and the meanings of proper names and investigate name changing practices among Eegimaa speakers. Our study shows that, in addition to revealing aspects of individuals’ lives, proper names also reveal important aspects of speakers’ social organisation. As a result, anthroponymy is an area of possible collaborative research with other disciplines including anthropology and philosophy.
  African language documentation: new data, methods and approaches

SP09: Language Documentation and Conservation in Europe

  Authenticity and linguistic variety among new speakers of Basque
    Abstract: This paper argues that the type of variety learned and used by Basque language learners is a key element in their self-perception as “true” or authentic speakers of Basque. Drawing on focus groups and individual interviews, we find that new speakers are for the most part strongly oriented towards the value of authenticity epitomized by local varieties. While new speakers report the utility of their mastery over the new standard Basque variety, they are not inclined to view this mastery as granting themselves greater authority or ownership over Basque. Rather they strongly valorize the informal and vernacular speech forms indexing colloquial speech and local dialect most identified with native speakers. The new speaker’s sociolinguistic context and motivations for learning Basque seem to be predictive of the strength of this orientation. The findings of this study point to the necessity of further study and documentation of local vernacular as well the urgency for language educators to find ways of incorporating the acquisition of local and dialectal features into language instruction.
  Brief considerations about language policy: An European assessment
    Abstract: The rising of language policy worldwide is a consequence of a globalized world and the openness of borders. Even countries with a relative cultural homogeneity face nowadays new challenges regarding massive migration fluxes and the results of growing awareness for endangered languages and cultures, notably in Europe. This is being noticed around the Old Continent where diversity proves to be a distinct value since ever. In this paper we reflect on the scope of cultural identity and multilingualism to shed new light on language policy and consequently refresh our understanding of a key policy, which is already a decisive public policy for the European peoples.
  Identity and language shift among Vlashki/Zheyanski speakers in Croatia
    Abstract: The language Vlashki/Zheyanski, spoken in two areas – the Susnjevica area and Zejane – of the multilingual, multiethnic Istrian peninsula of Croatia, evinces strong loyalty on the part of its elderly speakers, yet in both areas a language shift to Croatian is well underway. Vlashki/Zheyanski is a severely endangered Eastern Romance language known in the linguistic literature as Istro-Romanian. In order to study the domains and frequency of use of the language and equally to examine speaker attitudes about language and identity, we administered a questionnaire to speakers in both locations. Our sample included responses from individuals in four age groups. Our discussion here focuses on 16 men and women from the two older groups, 51–70 and 71-and- older. In Zejane, speakers saw knowledge of the language and family lineage as defining components of being a “real” member of the community. The name for the language, Zheyanski, comes from the village name. Hence, someone who speaks the language asserts that village belonging and village affiliation are at the core of speakers’ identity. In terms of national identification, whether Croatian, Italian, and/or Istrian, Zheyanski speakers by and large showed little enthusiasm for any of the three choices. In terms of language use, all respondents continue to use the language on a daily basis but report that they speak mostly Croatian to their grandchildren. In the Susnjevica area, people used the same criteria, language knowledge and family lineage, to define group membership and feel close affiliation to their home village. Unlike in Zejane, the name of the language, “Vlashki”, does not correspond to a unitary group name accepted and liked by all. In terms of larger identity, villagers em- braced identities that they share with their Croatian-speaking neighbors: Most felt “extremely Istrian”, and at least “fairly Croatian”. The language shift to Croatian is also more advanced here: All the speakers report speaking mostly Croatian to their children. While speakers in both Zejane and the Susnjevica area endued their language with a critical role in their identity, this attitude toward Vlashki/Zheyanski does not manifest itself in their communication with younger generations where other social forces have caused the shift to the use of Croatian.
  Bridging divides: A proposal for integrating the teaching, research and revitalization of Nahuatl
    Abstract: This paper discusses major historical, cultural, linguistic, social and institutional factors contributing to the shift and endangerment of the Nahuatl language in Mexico. As a practical proposal, we discuss our strategy for its revitalization, as well as a series of projects and activities we have been carrying out for the last several years. Crucial to this approach are several complementary elements: interdisciplinary research, including documentary work, as well as investigation of both the historical and the present state of Nahuatl language and culture; integration of both Western and native-speaking indigenous researchers as equal partners and the provision of space for indigenous methodologies; creation of teaching programs for native and non-native speakers oriented toward the preparation of language materials; and close collaboration with indigenous communities in developing community-based programs. The operability of this strategy will depend greatly on our ability to foster collaboration across academic, social, and ideological boundaries, to integrate theory, methodology and program implementation, and to efficiently combine grass- roots and top-down approaches. An important aim is to restore the culture of literacy in Nahuatl through our monolingual Totlahtol series, publishing works from all variants of the language and encompassing all genres of writing. We also strive to strengthen the historical and cultural identity of native speakers by facilitating their access to the alphabetical texts written by their ancestors during the colonial era.
  The first Mirandese text-to-speech system
    Abstract: This paper describes the creation of base NLP resources and tools for an under-resourced minority language spoken in Portugal, Mirandese, in the context of the generation of a text-to-speech system, a collaborative citizenship project between Microsoft, ILTEC, and ALM – Associacon de la Lhengua Mirandesa. Development efforts encompassed the compilation of a large textual corpus, definition of a complete phone-set, development of a tokenizer, inflector, TN and GTP modules, and creation of a large phonetic lexicon with syllable segmentation, stress mark-up, and POS. The TTS system will provide an open access web interface freely available to the community, along with the other resources. We took advantage of mature tools, resources, and processes already available for phylogenetically-close languages, allowing us to cut development time and resources to a great extent, a solution that can be viable for other lesser-spoken languages which enjoy a similar situation.
  New speakers of Minderico: Dynamics and tensions in the revitalization process
    Abstract: From the sixteenth century on, the blankets of Minde, a small village in the center of Portugal, became famous all over the country. The wool combers, blanket producers, and traders of Minde began to use Minderico in order to protect their business from “intruders”. Later, this secret language extended to all social and professional groups and became the main means of communication in the village. During this process, Minderico turned into a full-fledged language with a very characteristic intonation and a complex morphosyntax, differentiating itself from Portuguese. However, the number of speakers declined drastically during the last 50 years. Minderico is now actively spoken by 150 speakers, but only 23 of them are fluent speakers. More than half of the fluent speakers are new speakers of the language. New speakerness is a relatively new phenomenon in the Minderico speaking community and a direct result of the revitalization process which was initiated in 2009. This paper examines the role of the new speakers in the revitalization of Minderico, considering issues of authenticity and socio-linguistic legitimacy.
  BaTelÒc: A text base for the Occitan language
    Abstract: Language Documentation, as defined by Himmelmann (2006), aims at compiling and preserving linguistic data for studies in linguistics, literature, his- tory, ethnology, sociology. This initiative is vital for endangered languages such as Occitan, a romance language spoken in southern France and in several valleys of Spain and Italy. The documentation of a language concerns all its modalities, covering spoken and written language, various registers and so on. Nowadays, Occitan documentation mostly consists of data from linguistic atlases, virtual libraries from the modern to the contemporary period, and text bases for the Middle Ages. BaTelOc is a text base for modern and contemporary periods. With the aim of creating a wide coverage of text collections, BaTelOc gathers not only written literary texts (prose, drama and poetry) but also other genres such as technical texts and newspapers. Enough material is already available to foresee a text base of hundreds of millions of words. BaTelOc not only aims at documenting Occitan, it is also designed to provide tools to explore texts (different criteria for corpus selection, concordance tools and more complex enquiries with regular expressions). As for linguistic analysis, the second step is to enrich the corpora with annotations. Natural Language Processing of endangered languages such as Occitan is very challenging. It is not possible to transpose existing models for resource-rich languages directly, partly because of the spelling, dialectal variations, and lack of standardization. With BaTelOc we aim at providing corpora and lexicons for the development of basic natural language processing tools, namely OCR and a Part-of-Speech tagger based on tools initially designed for machine translation and which take variation into account.
  Language Landscape: Supporting community-led language documentation
    Abstract: Different groups have differing motivations for participating in language documentation projects. Linguists want to increase our knowledge of languages and linguistic theory, but constraints on their work may lead to issues with their documentation projects, including their representations of the languages they study. Native speakers participate to maintain and develop their language, and may choose to represent it in a way which showcases their culture and attitudes. In order to encourage more native speakers to take part in documentation projects, a simple integrated system is required which will enable them to record, annotate and publish recordings. Language Landscape, our web-based application, enables native speakers to publish their recordings, and Aikuma, a mobile application for documentation, enables them to record and orally translate recordings, in both cases with minimal cost and training required. Language Landscape benefits communities by allowing them to document their language as they see fit, as demonstrated by our outreach program, through which some London school children created their own projects to document their own languages and those spoken around them.
  Reflections of an observant linguist regarding the orthography of A Fala de Us Tres Lugaris
    Abstract: A Fala has never had a standardized orthography as it is a language of oral tradition and almost all written documents have always been produced only in Spanish. The few documents which exist in A Fala use orthographies that vary considerably, especially when indicating the phonemes which are absent in standard Spanish. However, in the past decades there have been signs of an increasing interest regarding the language and cultural identity in the three villages and there have also been attempts to establish organizations to promote the language, such as A Fala y Cultura, U Lagartu Verdi, and A Nosa Fala. This increase in language awareness leads inevitably to situations, when the speakers want to express their linguistic identity in written form and the lack of written standard makes this task rather difficult. The objective of this paper is to analyze the public inscriptions, direction signs and street names written in A Fala. The appearance of these signs expresses the willingness of the speakers of A Fala to claim their linguistic identity. At the same time, their inconsistent orthography reveals the problems that arise in the course of writing their language. There are two main causes of these difficulties: The influence of Spanish, as all the speakers are bilingual in Spanish, and variation within the language itself. Regarding the first cause, the main issues include the uncertainty how to write the phonemes that do not exist in standard Spanish, and also whether the phonemes that do exist in Spanish should be written in the same way or not. In respect of the second cause, the signposts and street names reflect the three main varieties: Valverdenu, Lagarteiru and Manegu. They also partially reflect the ideas of those who created them and testify to a certain evolution in time. In general, the linguistic data in the form of street names and direction signs provide relevant information about the options for writing those phonemes which do not have an equivalent in Spanish, as well as geographical (diatopic) variation, and the changes of ideas regarding the orthography. This paper will use this valuable linguistic material to reflect on the issues that are involved in the establishment of an orthographical standard.
  Kormakiti Arabic: A study of language decay and language death
    Abstract: Kormakiti Arabic (also called Cypriot Maronite Arabic) is a language with approximately 150–200 speakers in Kormakitis, a village north-western Cyprus. Kormakiti Arabic is highly endangered, not only due to its low number of speakers but more importantly because younger Maronites with their roots in Kormakitis do not acquire Kormakiti Arabic naturally any more. Kormakitis itself is almost only inhabited by elderly Maronites who lived there before the separation of Cyprus in 1974. This paper is on language death and language decay of Kormakiti Arabic. Several historical sources are used in order to illustrate the historical and socio-linguistic environment this language survived until today. The linguistic evidence is then compared with the theory of Gaelic-Arvanitika-Model Sasse (1992a) in order to show parallels, as well as the differences between Arvanitika and Kormakiti Arabic.
  Multilingualism and structural borrowing in Arbanasi Albanian
    Abstract: In this paper we present a brief overview of the history of linguistic contacts of Arbanasi Albanian, a Gheg Albanian dialect spoken in Croatia, with Croatian and Italian. Then we discuss a number of contact-induced changes in that language. We show that Arbanasi Albanian was subject to strong influences from Croatian (and, to a lesser extent, from Italian) on all levels of linguistic structure. Using the data from our own fieldwork, we were able to show that there were also influences on the level of syntax, including the borrowing of certain constructions, such as analytic causative and imperative constructions, as well as the extension of the use of infinitive in subordinate clauses.
  Lemko linguistic identity: Contested pluralities
    Abstract: In their efforts to organize as a recognized minority within the Polish state, the Lemkos have faced a number of obstacles, both internal and external to the community. This article explores three aspects of self-representation of the Lemko community - group membership, victimhood and “speakerhood” – and examines how these representations are contested on a number of levels.
  El árabe ceutí, una lengua minorizada. Propuestas para su enseñanza en la escuela
    Abstract: The Arabic of Ceuta is the native language of 40% of the Spanish population of Ceuta, which also speaks Spanish. The remainder 60% is mostly monolingual and their native language is Spanish. There is also 1% of bilingual citizens whose native tongue is Sindhi. The Arabic of Ceuta is Moroccan Arabic, the native language of 60% of the population of the neighboring country and, specifically, it shares common features with the northern dialect area (Yebala region and the Atlantic coast down to the city of Larache). But its use in Spanish territory since the second half of 19th century gave rise to two phenomena: Spanish borrowings and code-switching in the case of bilingual speakers. The Arabic of Ceuta is an oral language, like Moroccan Arabic, which has never been standardized from the political sphere, in contrast with literal Arabic (also called cultivated, standard, modern or classic), which is not the native language of any Arab in the world and has emerged as the only means of educational, political, and cultural expression due to political and religious power. Despite this, there is a whole literary tradition, oral and written, in Moroccan Arabic, especially from the 20th century. Currently, there is a group of Moroccan professors and intellectuals working on its coding in order to generalize a writing system in Arabic script. Ceuta is the Spanish region with the highest school dropout rate in Spain, and this is particularly acute in schools where the majority of students are bilingual. Many experts recommend teachers and professors to teach in the native language of their pupils, at least at the beginning of their education. In this paper we will put forward some proposals for the recognition of Ceuta Arabic as coded by the movement of Moroccan intellectuals who are already working on the development of a dictionary, a grammar, text collections, and translations of works from the European literature to Moroccan Arabic. The ultimate goal should be its inclusion in the educational and administrative services of the city as well as to achieve an official status in the future, rightly recognized by the Spanish Constitution.
  Language Revitalization: The case of Judeo-Spanish varieties in Macedonia
    Abstract: Judeo-Spanish is a secondary dialect of the Spanish language having evolved from the ancient standard Spanish in the course of its expansion southwards. Although the language enjoys a heritage and presence in the Balkans of over five centuries, it is now facing language death – its acuteness depending on the region. In Macedonia,1 the two varieties of Bitola and Skopje last documented by Kolonomos (1962) need to be labelled “moribund” or “nearly extinct”. This paper aims to point out some of the aspects relevant to the author’s doctoral research study, in which a documentation of the current language status of Judeo-Spanish in Macedonia is envisaged. The deliberations look at the reasons for language endangerment and at the same time evaluate possibilities and opportunities for language revitalization – what priorities are to be set, what role do linguists and especially the community play, what is the approach, what are skills, methods, and steps to be taken into consideration to ensure not only a documentation of the language, but also and foremost its conservation and revitalization.
  The sociolinguistic evaluation and recording of the dying Kursenieku language
    Abstract: Since the times of the Teutonic order until 1923, the Curonian Peninsula was a part of Prussia, and later – a part of Germany. Baltic tribes’ migration pro- cesses of different intensity occurred here. In the 16th century the newcomers from Latvian speaking Courland started to dominate, moving to the spit in several waves up to the 18th century; at the same time, people from the continental part (the majority of them were Germanized Prussians), colonizers from other German lands, and Lithuanians from the Klaipeda area settled in the region. The Kursenieku language, also known as New Curonian (German Nehrungskurisch) can be categorized as a mixture of Latvian Curonian dialects with Lithuanian, German, and elements of the now extinct Old Prussian. Since it had no written form, Kursenieku was roofed by Lithuanian and later by German, which had functioned as languages of religion and education for a long time. The community disintegrated at the end of World War II. After the Kursenieki community left their homeland and settled in different towns and villages of Germany, there was no practical use for the maintenance of Kursenieku. The chronological reconstruction of the Kursenieku is possible and useful for the Baltic studies; however, there is no motive for revitalization: nowadays, there is no community willing to use this language. This article briefly presents the development of the Kursenieku language in its ethnocultural context. Moreover, it raises the discussion around its status (variety or language), provides its sociolinguistic characteristics, describes the work that has been done with the language, and presents urgent goals and research perspectives.
SP08: The Art and Practice of Grammar Writing

  On the role and utility of grammars in language documentation and conservation
    Abstract: The National Science Foundation warns that at least half of the world’s approximately seven thousand languages are soon to be lost. In response to this impending crisis, a new subfield of linguistics has emerged, called language documentation or, alternatively, documentary linguistics. The goal of this discipline is to create lasting, multipurpose records of endangered languages before they are lost forever. However, while there is widespread agreement among linguists concerning the methods of language documen- tation, there are considerable differences of opinion concerning what its products should be. Some documentary linguists argue that the outcome of language documentation should be a large corpus of extensively annotated data. Reference grammars and dictionaries, they contend, are the products of language description and are not essential products of language documentation. I argue, however, that grammars (and dictionaries) should normally be included in the documentary record, if our goal is to produce products that are maximally useful to both linguists and speakers, now and in the future. I also show that an appropri- ately planned reference grammar can serve as a foundation for a variety of community grammars, the purposes of which are to support and conserve threatened languages.
  Corpus linguistic and documentary approaches in writing a grammar of a previously undescribed language
    Abstract: Drawing on her experiences with writing a grammar in the course of the Teop language documentation project, the author explores how corpus linguistic methods can be employed for the analysis and description of a previously undescribed language. After giving a short introduction into the creation of a digital corpus and complex corpus search methods, the chapter focuses on the importance of creating a diversified corpus. It demonstrates that different text varieties such as spoken and written legends, procedural texts and descriptions of objects show different preferences for certain ways of expression and thus represent valuable resources for various grammatical phenomena. Accordingly, a grammar which is based on texts should account for this variation by incorporating a detailed description of the corpus, giving references and metadata for each example and providing information on the kind of contexts particular grammatical features are usually associated with.
  Grammar writing from a dissertation advisor's perspective
    Abstract: Anyone who intends to produce a grammar of a previously little-described language needs to (1) plan the scope, methods and timetable of the data gathering process, (2) think about the conceptual framework that will shape data-gathering and analysis, (3) gather and organize the data, (4) analyse the data, and (5) plan the structure of the written account and (6) write the grammar. The steps are not simply sequential but are to some extent cyclical. This chapter will look at an advisor’s role in guiding a PhD student through these steps. It will focus on the following questions: What kinds of data, and how much, are sufficient to base a grammar on? What is a realistic size for a PhD dissertation grammar? What are the main alternative ways of organizing a grammatical description, e.g. in terms of topic divisions and sequencing? What are the dos and don’ts to be followed in order to make the grammar as descriptively adequate and user friendly as possible? What are the main reasons why some students take forever to complete the analysis and writing process?
  Walking the line: Balancing description, argumentation and theory in academic grammar writing
    Abstract: This chapter explores how to incorporate linguistic typology, argumentation, and theor- etical innovation into a reference grammar. It provides recommendations on how to produce a balanced grammar that is firmly grounded in theory, responsible to the unique structures of the language, and comprehensible now and over time. Linguistic typology provides a set of widely recognized linguistic categories used in the classification of grammatical patterns. These can be taken as starting points from which the structures of the language can be compared, contrasted, explored, and explained, profiling the unique shapes of language-particular categories. Argumentation for particular analyses provides clarification and explanation, although excessive argumentation can obscure descriptive facts. Simply asserting facts is appropriate for lower-level linguistic features, simple canonical structures, or uncontroversial elements or their functions. Argumentation is appropriate when structures differ from typologically-expected patterns, when the analysis counters descriptions in the literature, and in cases of multiple interpretations of a structure. Grammar writing immerses researchers in the structure of a language, revealing new vistas of understanding and novel ways of interpreting structure. Theoretically innov- ative analyses that reflect these insights can be incorporated as long as they are motivated, well-explained, and balanced by a typologically-informed descriptive base.
  Introduction
  Endangered domains, thematic documentation and grammaticography
    Abstract: When setting out to document a language with the intended goal of describing it (typically through a grammar and dictionary), fieldworkers prefer to collect an array of linguistic data, ranging from elicited words and paradigms to an assortment of texts based on conversa- tions, narratives, procedures and so forth. Capturing a wide variety of speech acts provides a clearer record of the language and its use, and thus offers the potential for a richer description of the language at hand. However, without controlling for content, one may collect linguistic data based on an open-ended amount of topics or themes. The purpose of this chapter is to introduce the notion of endangered linguistic domains and themes in language documentation and description. Even in thriving minority languages, domains such as indigenous music or knowledge of flora and fauna come under pressure from the same forces that eventually lead to language endangerment. Gathering linguistic data based on a particular domain or specialized knowledge can generate a corpus applicable to a wider audience without sacrificing the needs of linguists. Similar to thematic dictionaries in lexicography, this introduces thematic grammars to grammaticography.
  The data and the examples: Comprehensiveness, accuracy, and sensitivity
    Abstract: Good grammars are read by diverse audiences with a wide variety of interests. One might not write a reference grammar in exactly the same way for all potential users, but particularly in the case of under-documented and endangered languages, it is likely that whatever is produced now will be consulted for answers to questions beyond those originally anticipated. A good grammar can provide more than descriptions of patterns the grammarian has noted at the time of writing; the examples it contains can provide a basis for future discoveries and new uses. It thus makes sense to consider the types of data that might best meet the needs of current and future readers, some of which we cannot even imagine at present. For some purposes, sensitive, typologically-informed elicitation is necessary, while for others, material drawn from unscripted connected speech is crucial. Here the potential contributions of examples of each type are considered for descriptions of phonetics, phonology, morphology, syntax, discourse, prosody, language change, and language contact.
  Toward a balanced grammatical description
    Abstract: The writer of a grammatical description attempts to accomplish many goals in one complex document. Some of these goals seem to conflict with one another, thus causing tension, discouragement and paralysis for many descriptive linguists. For example, all grammar writers want their work to speak clearly to general linguists and to specialists in their language area tradition. Yet a grammar that addresses universal issues, may not be detailed enough for specialists; while a highly detailed description written in a specialized areal framework may be incomprehensible to those outside of a particular tradition. In the present chapter, I describe four tensions that grammar writers often face, and provide concrete suggestions on how to balance these tensions effectively and creatively. These tensions are: • Comprehensiveness vs. usefulness. • Technical accuracy vs. understandability. • Universality vs. specificity. • A ‘form-driven’ vs. a ‘function-driven’ approach. By drawing attention to these potential conflicts, I hope to help free junior linguists from the unrealistic expectation that their work must fully accomplish all of the ideals that motivate the complex task of describing the grammar of a language. The goal of a description grammar is to produce an esthetically pleasing, intellectually stimulating, and genuinely informative piece of work.
  Sounds in grammar writing
    Abstract: While there has been much written on writing grammars in recent years, relatively little has been written on the place of sounds and their patterning in grammar writing. In this chapter I provide an overview of some of the challenges of writing about sounds, and discuss the kinds of information on sounds that are generally included in grammars. I then address what a grammar might ideally include on the sounds of a language, advocating the inclusion of sound files to augment the usual topics, increasing both the scientific merit and the human value of the grammar.
SP07: Language Endangerment and Preservation in South Asia

  5.The lifecycle of Sri Lanka Malay
    Abstract: The aim of this paper is to document the forces that led first to the decay and then the revival of the ancestral language of the Malay diaspora of Sri Lanka. We first sketch the background of the origins of the language in terms of intense contact and multilingual transfer; then analyze the forces that led to a significant language shift and consequent loss, as well as the factors responsible for the recent survival of the language. In doing so we focus in particular on the ideologies of language upheld within the community, as well as on the role of external agents in the lifecycle of the community.
  Language Endangerment and Preservation in South Asia
  1. Death by other means: Neo-vernacularization of South Asian languages
    Abstract: Endangerment of a language is assessed by the shrinking number of its speakers and the failure to pass it on to the next generation. This approach views multilingualism in statistical terms. When multilingualism is defined by the functional relationship between languages the meaning of endangerment expands to include functional reduction in languages. This takes place when the economic, political and cultural value of a language comes to near zero. The language may still be spoken inter-generationally, but only for limited in-group communication. Such a language survives, but does not live. This situation can be found even in a language with a large population and official status. This paper illustrates such a situation with Tamil, a South Asian language. Tamil has a long literary history, is the official language of an Indian state and has political and cultural value. But its lack of economic value makes its speakers consider it a liability in education and for material progress and this restricts it from functioning substantively. Such a language will not die but will become a vernacular. Most Indian regional languages, which were vernaculars in the first millenium when Sanskrit was the dominant language, may become vernaculars again in the third millenium when English is the dominant language.
  Foreword
  2. Majority language death
    Abstract: The notion of ‘language death’ is usually associated with one of the ‘endangered languages’, i.e. languages that are at risk of falling out of use as their speakers die out or shift to some other language. This paper describes another kind of language death: the situation in which a language remains a powerful identity marker and the mother tongue of a country’s privileged and numerically dominant group with all the features that are treated as constituting ethnicity, and yet ceases to be used as a means of expressing its speakers’ intellectual demands and preserving the community’s cultural traditions. This process may be defined as the ‘intellectual death’ of a language. The focal point of the analysis undertaken is the sociolinguistic status of Punjabi in Pakistan. The aim of the paper is to explore the historical, economic, political, cultural and psychological reasons for the gradual removal of a majority language from the repertoires of native speakers.
  3. Ahom and Tangsa: Case studies of language maintenance and loss in North East India
    Abstract: North East India is probably the most linguistically diverse area on the Indian subcontinent, with long established communities speaking languages of four different families – Austroasiatic, Indo-European, Tai-Kadai and Tibeto-Burman. Comparing Tai Ahom, language of the rulers of a kingdom that consisted of what is now Assam, with the very diverse Tangsa varieties spoken on the India-Myanmar border, we will discuss factors of language decline and language maintenance. Tai Ahom has not been spoken as a mother tongue for 200 years, but survives in the large body of manuscripts, and in the language used in religious rituals. While both of these features have been necessary foundations of the ongoing revival of the language, neither was able to maintain the language in its spoken form. At least 35 different Tangsa sub-tribes are found in India, with more in Myanmar. Each has a distinct linguistic variety, many of which are mutually intelligible while others are not. Despite having no writing until very recently, each variety is still healthy. Since many Tangsas are now Christians, Bible translations are underway, and many Tangsa of all religions are interested in orthography and literacy development. This may lead to standardisation, which would represent a significant loss of diversity.
  4. Script as a potential demarcator and stabilizer of languages in South Asia
    Abstract: South Asia is rich not only in languages, but also in scripts. However, the various roles script can play in this region have been only marginally explored. Besides an overview of the most important examples from South Asia in which script has contributed to the strengthening or weakening of a language, or to the classification of a tongue as a language or dialect, this paper offers first inputs for a discussion on the role of script today in smaller speech communities which lack a long literary tradition. Especially in cases of script invention, script is not only allocated the role of an identity marker for the speech community, but seems to be expected to strengthen the language itself, and finally to act as a preserver of the minority language.

SP06: Microphone in the mud

  Microphone in the mud
    Abstract: A young woman battles armed terrorists, a kidnapper, malaria, a tsunami, and dial-up Internet as she documents the endangered languages of hunter-gatherers in the jungles of the Philippines.

SP05: Melanesian languages on the edge of Asia: Challenges for the 21st Century

  Even more diverse than we had thought: The multiplicity of Trans-Fly languages
    Abstract: Linguistically, the Trans Fly region of Southern New Guinea is one of the least known parts of New Guinea. Yet the glimpses we already have are enough to see that it is a zone with among the highest levels of linguistic diversity in New Guinea, arguably only exceeded by those found in the Sepik and the north coast. After surveying the sociocultural setting, in particular the widespread practice of direct sister-exchange which promotes egalitarian multilingualism in the region, I give an initial taste of what its languages are like. I focus on two languages which are neighbours, and whose speakers regularly intermarry, but which belong to two unrelated and typologically distinct families: Nen (Yam Family) and Idi (Pahoturi River Family). I then zoom out to look at some typological features of the whole Trans-Fly region, exemplifying with the dual number category, and close by stressing the need for documentation of the languages of this fascinating region.
  Keeping records of language diversity in Melanesia: The Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)
    Abstract: At the turn of this century, a group of Australian linguistic and musicological researchers recognised that a number of small collections of unique and often irreplaceable field recordings mainly from the Melanesian and broader Pacific regions were not being properly housed and that there was no institution in the region with the capacity to take responsibility for them. The recordings were not held in appropriate conditions and so were deteriorating and in need of digitisation. Further, there was no catalog of their contents or their location so their existence was only known to a few people, typically colleagues of the collector. These practitioners designed the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), a digital archive based on internationally accepted standards (Dublin Core/Open Archives Initiative metadata, International Asociation of Sound Archives audio standards and so on) and obtained funding to build an audio digitisation suite in 2003. This is a new conception of a data repository, built into workflows and research methods of particular disciplines, respecting domain-specific ethical concerns and research priorities, but recognising the need to adhere to broader international standards. This paper outlines the way in which researchers involved in documenting languages of Melanesia can use PARADISEC to make valuable recordings available both to the research community and to the source communities.
  Systematic typological comparison as a tool for investigating language history
    Abstract: Similarities between languages can be due to 1) homoplasies because of a limited design space, 2) common ancestry, and 3) contact-induced convergence. Typological or structural features cannot prove genealogy, but they can provide historical signals that are due to common ancestry or contact (or both). Following a brief summary of results obtained from the comparison of 160 structural features from 121 languages (Reesink, Singer & Dunn 2009), we discuss some issues related to the relative dependencies of such features: logical entailment, chance resemblance, typological dependency, phylogeny and contact. This discussion focusses on the clustering of languages found in a small sample of 11 Austronesian and 8 Papuan languages of eastern Indonesia, an area known for its high degree of admixture.
  Cross-cultural differences in representations and routines for exact number
    Abstract: The relationship between language and thought has been a focus of persistent interest and controversy in cognitive science. Although debates about this issue have occurred in many domains, number is an ideal case study of this relationship because the details (and even the existence) of exact numeral systems vary widely across languages and cultures. In this article I describe how cross-linguistic and cross-cultural diversity—in Amazonia, Melanesia, and around the world—gives us insight into how systems for representing exact quantities affect speakers’ numerical cognition. This body of evidence supports the perspective that numerals provide representations for storing and manipulating quantity information. In addition, the differing structure of quantity representations across cultures can lead to the invention of widely varied routines for numerical tasks like enumeration and arithmetic.
  Introduction: Linguistic challenges of the Papuan region
  From mountain talk to hidden talk: Continuity and change in Awiakay registers
    Abstract: When the Awiakay of East Sepik Province in Papua New Guinea left their village or bush camps and went to the mountains, they used a different linguistic register, ‘mountain talk’, in which several lexical items are replaced by their avoidance terms. In this way the Awiakay would prevent mountain spirits from sending sickness or dense fog in which they would get lost on their journeys. Over the last decade people’s trips to the mountain have become more frequent due to the eaglewood business. However, Christianity caused a decline in the use of ‘mountain talk’. Yet a linguistic register similar in its form and function has sprung up in a different setting: kay menda, ‘different talk’, or what people sometimes call ‘hidden talk’, is used when the Awiakay go to the town to sell eaglewood and buy goods. Like other cultural phenomena, linguistic registers are historical formations, which change in form and value over time. This paper aims to show how although in a different social setting, with an expanded repertoire and a slightly different function, kay menda is in a way a continuity of the ‘mountain talk’.
  Papuan-Austronesian language contact: Alorese from an areal perspective
    Abstract: This paper compares the grammar and lexicon of Alorese, an Austronesian language spoken in eastern Indonesia, with its closest genealogical relative, Lamaholot, spoken on east Flores, as well as with its geographical neighbours, the Papuan languages of Pantar. It focusses on the question how Alorese came to have the grammar and lexicon it has today. It is shown that Alorese and Lamaholot share a number of syntactic features which signal Papuan influences that must have been part of Proto-Lamaholot, suggesting (prehistoric) Papuan presence in the Lamaholot homeland in east Flores/Solor/Adonara/ Lembata. The data indicate that Proto-Lamaholot had a rich morphology, which was completely shed by Alorese after it split from Lamaholot. At the same time, lexical congruence between Alorese and its current Papuan neighbours is limited, and syntactic congruence virtually absent. Combining the comparative linguistic data with what little is known about the history of the Alorese, I propose a scenario whereby Lamaholot was acquired as non-native language by spouses from different Papuan clans who were brought into the Lamaholot communities that settled on the coast of Pantar at least 600 years ago. Their morphologically simplified language was transferred to their children. The history of Alorese as reconstructed here suggests that at different time depths, different language contact situations had different outcomes: prehistoric contact between Papuan and Proto-Lamaholot in the Flores area resulted in a complexification of Proto-Lamaholot, while post-migration contact resulted in simplification. In both cases, the contact was intense, but the prehistoric contact with Papuan in the Flores area must have been long-term and involve pre-adolescents, while the post-migration contact was probably of shorter duration and involved post-adolescent learners.
  'Realis' and 'irrealis' in Wogeo: A valid category?
    Abstract: Finite verb forms in Wogeo, an Austronesian language of New Guinea, are obligatorily marked with a portmanteau prefix denoting person and number of the subject on the one hand, and a grammatical category that is conventionally glossed in the literature as realis–irrealis, on the other. In similar languages, the latter category is usually described as modal, with a certain range of meanings which is, in many cases, only vaguely defined. A more in-depth investigation of the verbal system of Wogeo and the functional distribution of the respective categories shows, however, that the language is quite different from a postulated prototypical realis–irrealis language. Central attributes of the supposed realis–irrealis semantics are not realized by the obligatory prefixes but by other morphosyntactic means, while the prefixes are restricted to only a small part of the assumed realis–irrealis domain.
  The languages of Melanesia: Quantifying the level of coverage
    Abstract: The present paper assesses the state of grammatical description of the languages of the Melanesian region based on database of semi-automatically annotated aggregated bibliographical references. 150 years of language description in Melanesia has produced at least some grammatical information for almost half of the languages of Melanesia, almost evenly spread among coastal/non-coastal, Austronesian/non-Austronesian and isolates/large families. Nevertheless, only 15.4% of these languages have a grammar and another 18.7% have a grammar sketch. Compared to Eurasia, Africa and the Americas, the Papua-Austronesian region is the region with the largest number of poorly documented languages and the largest proportion of poorly documented languages. We conclude with some dicussion and remarks on the documentational challenge and its future prospects.
  Projecting morphology and agreement in Marori, an isolate of Southern New Guinea
    Abstract: This paper is the first detailed investigation on agreement in Marori (Isolate, Papuan, Merauke-Indonesia), highlighting its significance in the cross-linguistic understanding of NUM(BER) expression and in the unification-based theory of agreement. Marori shows PERS and NUM agreement with distributed exponence in DUAL. The paper proposes that DUAL is formed by two basic NUM features (SG, PL) each with its binary values and that DUAL is [-SG,-PL] (unmarked). The novel aspect of the analysis is the idea that the NUM feature is mapped onto a language-specific structured semantic space of NUM. A morpheme is analysed as carrying a feature bundle, with the semantic spaces referred to by the individual features possibly overlapping with each other. The proposed analysis can provide a natural explanation for NUMBER agreement in Marori and can be extended to account for unusual cases of NUM agreement and expression in other languages.
SP04: Electronic Grammaticography

  From Database to Treebank: On Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search
    Abstract: This paper describes how electronic grammars can be further enhanced by adding machine-readable grammars and treebanks. We explore the potential benefits of im- plemented grammars and treebanks for descriptive linguistics, following the discursive methodology of Bird & Simons (2003) and the values and maxims identified by Nordhoff(2008). We describe the resources which we believe make implemented grammars and treebanks feasible additions to electronic descriptive grammars, with a particular focus on the Grammar Matrix grammar customization system (Bender et al. 2010) and the Fangorn treebank search application (Ghodke & Bird 2010). By presenting an ex- ample of an implemented grammar based on a descriptive prose grammar, we show one productive method of collaboration between grammar engineer and field linguist, and propose that a tighter integration could be beneficial to both, creating a virtuous cycle that could lead to more effective and informative resources.
  Deconstructing descriptive grammars
    Abstract: Much work within digital linguistics has focused on the problem of developing concrete methods and general principles for encoding data structures designed for non-digital media into digital formats. This work has been successful enough that the field is now in a position to move past "retrofitting" digital solutions onto analog structures and to consider how new technologies should actually change linguistic practice. The domain of grammaticography is looked at from this perspective, and a traditional descriptive grammar is reconceptualized as a database of linked data, in principle curated from distinct sources. Among the consequences of such a reconceptualization is the potential loss of two valued features of traditional descriptive grammars, here termed coverage and coherence. The nature of these features is examined in order to determine how they can be integrated into a linked data model of digital descriptive grammars, thereby allowing us to benefit from new technology without losing important features intrinsic to the structure of the traditional version of the resource.
  Advances in the accountability of grammatical analysis and description by using regular expressions
    Abstract: This paper discusses the representativeness, coextensitivity and scientific accountability of corpus-based grammatical descriptions of previously unresearched languages. While a grammatical description of a previously unresearched language can hardly be representative for any kind of its varieties, it can be adequate n coextensitivity if it covers the linguistic phenomena presented in the corpus. In order to allow other researchers to retrieve the examples in their context and check the analysis, the corpus should not only contain text collections, but also the elicited data, provide metadata and be accessible to other researchers. Scientific accountability, however, can only be achieved, if the description facilitates the replicability of the analysis, which presupposes that the authors’ corpus linguistic search methods are documented, so that the readers can find other, if not all examples for the described phenomena, and scrutinize the search methods, the analysis and the description. As is illustrated in this paper, a suitable query language for this kind of scientific grammatical analysis and description are the so-called regular expressions which are implemented in the annotation tool ELAN.
  From corpus to grammar: how DOBES corpora can be exploited for descriptive linguistics
    Abstract: The principles and techniques of language documentation developed during the last one and half decades and the sheer amount of corpora which have been compiled for endangered languages up to now will have an impact on grammar writing in particular with respect to the data base of grammars. On the other hand, advances in computer technology allow a closer link between corpus data which are the basis for generalizations and the grammatical description itself. The future the grammatical description of a language will not only present selected illustrative examples, but will also be linked to the entire set of corpus data that are the empirical basis for it. This makes generalizations transparent to the reader and open to falsification by the scientific community. The article critically examines the relations between the DOBES corpus, the analysis and the grammatical description itself. Special attention will be laid on the particular the two fundamental perspectives of a semasiological and an onomasiological grammar, can be translated into the various kinds of search and concordancing routines to be executed in the corpus analysis. We present a typology of searches descriptive linguists need to apply. This typology defines requirements with regard to the functionality of specific software to be developed. In the second part, the article presents a technical solution, a preliminary version of a database/concordancing software specifically designed to fulfill the functions and principles outlined in the preceding sections.
  Electronic Grammars and Reproducible Research
    Abstract: It is time for grammatical descriptions to become reproducible research. In order for this to happen, grammar descriptions must be testable, not only by the original author, but also by other linguists. Given the complexity of natural language grammars, and the ambiguity of prose descriptions, that testing is best done using computational tools to verify a computationally implementable grammar. At the same time, grammars need to be useful---and testable---for the foreseeable future; that is, they must be archivable. Yet if a computational grammar is tied to particular computational tools, it will inevitably become obsolescent. This paper describes a means of creating computationally interpretable grammars which are not tied to particular computational tools, nor (to the extent possible) to any particular linguistic theory, and which can therefore be expected to remain useful into the future. In order to make such formal grammars simultaneously understandable to humans, they are embedded into descriptive grammars of a more traditional sort, using the technique of Literate Programming. The implementation of this technology for morphology and phonology is described. It has been used to create morphological grammars for Bangla, Urdu and Pashto which are both human-readable and computationally testable.
  Digital Grammars -- Integrating the Wiki/CMS approach with Language Archiving Technology and TEI
    Abstract: Although intrinsically closely related to the new field of language documentation, grammaticography is still mostly oriented to the book model, usually falling short of making use of related digital resources and hypertext functionalities. In this contribution, we show and discuss possible or easily achievable advances that can built on top of existing technology such as Language Archiving Technology as developed at The Language Archive at the MPI-PL: Exemplars and examples can be found in multimedia corpora of natural speech events annotated with ELAN and visualized with ANNEX, words and word forms can be linked to lexical entries in LEXUS online-databases, and the precise meaning of theoretical concepts can be given in ISOcat entries or related terminological databases. Independently from LAT, Wiki-technology provides online collaboration and version control and opens even the possibility to address different audiences in related sets of pages, but also poses challenges for the overall didactic structure of a descriptive work. As one of the formats, at least for export and exchange, the XML-based TEI may provide a suitable framework, although many specialized tags would still have to be introduced and formatting and functionalities for these tags still has to be implemented. Generally, synchronization between different versions (e.g., on-line and off-line) poses the most intriguing difficulties, but the advantages (also in terms of Nordhoff's maxims) of hypertext grammars as proposed here are overwhelming.
  Language description and hypertext: Nunggubuyu as a case study
    Abstract: Any reasonably complete description of a language is a complex object, typically composed of a grammar, a dictionary, and a text collection with internal relationships that can be represented as hyperlinks. The information would be fully searchable, links between text and media could be implemented, and the presentation would be based on a well-defined data structure with advantages for archiving and reusability. We present a small fragment from Heath's Nunggubuyu text collection with links to parts of the other elements of the description to demonstrate the benefit which this approach can bring. This initial step involves a certain amount of hand-coding but establishes a basis for the necessary data structure which will then be used in a second phase where we develop techniques for the automatic processing of scanned versions of Heath's work. Grammatical descriptions written with the kinds of structure we are developing, or capable of being converted to that structure (while being 'born digital') are likely to be in short supply. Presentations of old materials in new formats will inform new electronic grammars, and help gain the acceptance of the linguistic community for preferred formats.
  Grammars for the people, by the people, made easier using PAWS and XlingPaper
    Abstract: The task of documenting the minority languages of the world, many of them endangered, is daunting. Further, it is most likely impossible to expect that linguists can go to every language and write a reference grammar for it. At the same time, the indigenous people are becoming more educated and more interested in working on their own languages. This paper describes a computational tool that teaches native speakers about various linguistic constructions, has them enter data from their language and answer simple questions about it, and then produces a draft of a practical grammar of the language. This grammar can be edited for publishing electronically and/or on paper and is useful for the people themselves as well as by linguists. The underlying XML technology allows much of the complexity to be hidden from the user, while providing multiple views and outputs possible from the same data. The marked-up XML files are archivable and usable by many XML editors. Localization and customization are also possible.
  The grammatical description as a collection of form-meaning-pairs
    Abstract: This paper analyzes the structure of books containing grammatical descriptions and builds up on work by Good (2004). It argues that the discussion of morphology, syntax, semantics, and intonation found in grammatical descriptions can be seen as a collection of interdependent form-meaning-pairs. These form-meaning-pairs form part of the larger structure of frontmatter, mainmatter and backmatter (Mosel 2006) and have themselves an internal structure which includes, among other things, linguistic examples as formalized by Bow et al (2003).
  Reference grammars for speakers of minority languages
    Abstract: Most of the work done in grammaticography focuses on the writing of grammars for an audience of linguists, and more specifically, typologists. In this paper, we present a grammaticographic model designed mainly to take into account the needs of minority language speakers, because they play a central role in the preservation of their language. However, since in minority language situations it is not possible to generate as many grammars as there are different potential end users, we propose a multilevel grammar, based on our experience as grammarian of Innu, a First Nation language spoken in Quebec (Canada). In this type of grammatical description, the first (main) level is addressed to non-specialist users, the speakers of the language being described, whereas grammatical material aimed at other users (such as linguists) is presented in secondary levels and is limited to core information. Our grammaticographic model was initially conceived for paper (printed) grammars, but we believe that electronic publication offers interesting solutions for multilevel grammars, while paper (printed) grammatical descriptions have greater limitations.

SP03: Potentials of Language Documentation: Methods, Analyses, and Utilization

  Information structure, variation and the Referential Hierarchy
    Abstract: Silverstein (1976)’s hierarchy of features and ergativity (Referential Hierarchy) was proposed to capture apparent systematic variation with respect to word-class (pronouns versus nouns) in the expression of the grammatical functions Subject and Object and the semantic roles Agent and Undergoer linked to these functions. An assumption of the original hierarchy was obligatoriness of marking, rather than optionality (i.e. choice of marker or its absence). Optionality is often associated with a semantic/pragmatic force additional to straight expression of grammatical function. This additional meaning may determine reanalysis and subsequent change in the morphosyntactic expression of Subject/Object/Agent/Undergoer. Along the way, apparent counter-examples to the Referential Hierarchy may be created. To understand the counter-examples, and test the descriptive adequacy of the Referential Hierarchy, better language documentation is needed.
  Prospects for e-grammars and endangered languages corpora
    Abstract: This contribution explores the potentials of combining corpora of language use data with language description in e-grammars (or digital grammars). We present three directions of ongoing research and discuss the advantages of combining these and similar approaches, arguing that the technological possibilities have barely begun to be explored.
  Supporting linguistic research using generic automatic audio/video analysis
    Abstract: Automatic analysis can speed up the annotation process and free up human resources, which can then be spent on theorizing instead of tedious annotation tasks. We will describe selected automatic tools that support the most time-consuming steps in annotation, such as speech and speaker segmentation, time alignment of existing transcripts, automatic scene analysis with respect to camera motion, face/person detection, and the tracking of head and hands as well as the resulting gesture analysis.
  On the sociolinguistic typology of linguistic complexity loss
    Abstract: The nature of the human language faculty is the same the world over, and has been so ever since humans became human. This paper, however, considers the possibility that, because of the influence which social structure can have on language structure, this common faculty may produce structurally different types of language under different sociolinguistic conditions. Changing sociolinguistic conditions in the modern world are likely to have the consequence that, in time, the only languages remaining in the world will be severely atypical of how languages have been throughout most of human history.
  Tours of the past through the present of eastern Indonesia
    Abstract: The past twenty years have seen a variety of data being collected from largely undocumented languages in eastern Indonesia, an area hitherto almost unknown. Such data are valuable in reconstructing the history of this area at a macro-level. In addition, as research in particular areas becomes more fine-grained, it is possible to combine linguistic data with data from oral history and ethnographic observation in order to reconstruct the migration histories of specific speaker groups. A case study of such a micro-level reconstruction is presented here.
  Unsupervised morphological analysis of small corpora: First experiments with Kilivila
    Abstract: Language documentation involves linguistic analysis of the collected material, which is typically done manually. Automatic methods for language processing usually require large corpora. The method presented in this paper uses techniques from bioinformatics and contextual information to morphologically analyze raw text corpora. This paper presents initial results of the method when applied on a small Kilivila corpus.
  Using language documentation data in a broader context
    Abstract: On the one hand we have never seen as much fieldwork and recording of small and endangered languages as we have over the past decade. On the other hand linguists are now also much more aware of the need to create records that can be reused by the people we record and that will still be available for their descendants. Our own descendants, the future researchers who will use our records, will also need to be able to find and make use of our research. The fragility of digital records means we need to pay attention to their curation over time and create suitable repositories if they do not already exist. In order for these aims to be achieved, we need to establish work practices now that allow the data to move easily from creation to the archive and to community use.
  How to measure frequency? Different ways of counting ergatives in Chintang (Tibeto-Burman, Nepal) and their implications
    Abstract: The frequency of linguistic phenomena is standardly measured relative to some structurally defined unit (e.g. per 1,000 words or per clause). Drawing on a case study on the acquisition of ergativity by children in Chintang, an endangered Tibeto-Burman language of Nepal, we propose that from a psycholinguistic point of view, it is sometimes necessary to measure frequencies relative to the length of the time windows within which speakers and hearers use the language, rather than relative to structurally defined units. This approach requires that corpus design control for recording length and that transcripts be systematically linked to timestamps in the audiovisual signal.
  Online presentation and accessibility of endangered languages data: The General Portal to the DoBeS Archive
    Abstract: Data depositories containing language documentation corpora are generally well structured, well maintained, and include large collections of many under-researched languages. However, they are not yet conceived of as resources that can be easily consulted on scientific or non-scientific questions pertaining to one of those languages. A general portal to the DoBeS archive has been created to facilitate access to the data, to attract more users to the archive, and to lower the threshold for users outside the linguistic community to access the data. The structure and the main features of this portal will be presented in this paper.
  Data from language documentations in research on referential hierarchies
    Abstract: This paper outlines potentials of documentary linguistics for typological research in referential hierarchies. Specifically, I will demonstrate how the analysis of original text data from the Oceanic language Vera’a enhances knowledge about referential hierarchy effects in the domains of number marking and morphosyntactic properties of objects. With this language-specific research as a background, I will outline ways in which original text data from language documentation projects can be used in cross-corpus investigations of aspects of referential hierarchies across languages.
  From language documentation to language planning: Not necessarily a direct route
    Abstract: In this paper I will consider how documentary linguists can provide support for community language planning initiatives, and I will discuss some issues. These relate partly to the process of language documentation: what and who we choose to document, how we define ‘a language’, and how we deal with language variation and change; and partly to community attitudes and dynamics.
  Bilingual multimodality in language documentation data
    Abstract: Most people in the world speak more than one language, making bilingualism the norm rather than the exception. Furthermore, speakers generally also move their hands – they gesture – in coordination with speech and language in nontrivial ways. Bilingualism and multimodality should thus be on research agendas focused on the nature of linguistic systems and language use in context, yet they are often overlooked. Conversely, research and theorizing on bilingualism and multimodality is often based on Western-European, standardized languages, and little is known about other linguistic contexts. This paper makes the point that language documentation data has the potential to inform theoretical and empirical studies of linguistics, bilingualism and multimodality in entirely new ways, and, conversely, that documentation work would benefit from taking the bilingual and multimodal nature of its data into account.
  Creating educational materials in language documentation projects – creating innovative resources for linguistic research
    Abstract: In its first two sections this paper briefly discusses two models of language documentation projects: the hierarchical model, in which the language documentation corpus (LDC) serves as a resource for the development of educational materials (EMs), and the integrative model, which integrates the production of EMs into the LDC and makes them a resource for linguistic research. The third and the fourth section describe how the integrative model was applied in the Teop Language Documentation Project and what kind of linguistic research topics it provides.
  A corpus linguistics perspective on language documentation, data, and challenge of small corpora
    Abstract: This paper deals with issues of corpus design that might prove problematic for the study of under-resourced languages, e.g. in language documentation. It argues that it is not yet well understood which linguistic and extra-linguistic (predictor) variables cause linguistic variation (i.e. the response variable), which means that the scope of a linguistic finding cannot always be assessed. In order to deal with this problem, it is argued that we need a flexible corpus architecture with the option of adding meta-data to corpora/sub-corpora at any point in time.
  Language archives: They're not just for linguists any more
    Abstract: While many language archives were originally conceived for the purpose of preserving linguistic data, these data have the potential to inform knowledge beyond the narrow field of linguistics. Today language archives are being used by people without formal linguistic training for purposes not necessarily envisioned by the original creators of the language documentation. The DoBeS Archive is particularly well-placed to become an important resource for cultural documentation, since many of the DoBeS projects have been interdisciplinary in nature, documenting language within its broader social and cultural context. In this paper I present a perspective from a legacy archive created well before the modern era of digital language documentation exemplified by the DoBeS program. In particular, I describe two types of non-linguistic uses which are becoming increasingly important at the Alaska Native Language Archive.
  Language-specific encoding in endangered language corpora
    Abstract: The paper addresses problems of corpus building and retrieval resulting from codeswitching, which is a characteristic feature of endangered language recordings. The typical appearance of code-switching phenomena is first outlined on the basis of data collected in the DoBeS ‘ECLinG’ project, which dealt with three endangered Caucasian languages spoken in Georgia: Tsova-Tush (Batsbi), Udi, and Svan. The problem of language-specific retrieval is illustrated with examples showing the usage of the word da in Tsova-Tush contexts, which represents, as a homonym, either a native copula form (‘it is’) or the Georgian conjunction ‘and’. The subsequent section discusses the annotation requirements that are necessary to automatically distinguish the languages involved in code-switching, with a focus on the emerging ISO standard 639-6. It is argued that the fine-grained distinction of varieties and subvarieties and their interrelationship – as aimed at in this standard – requires a thorough reconsideration if it is to be applied in the markup of corpus data.
  Visualization and online presentation of linguistic data
    Abstract: This contribution gives an introduction to state-of-the-art techniques for the visualization and online presentation of linguistic data and world-wide linguistic diversity, such as linguistic maps and online dictionaries, using a software environment called R. The aim is to draw linguists’ attention to the possibilities offered by these techniques and to give some practical hints as to how they can be used specifically for linguistic and language documentation data.
  The threefold potential of language documentation
    Abstract: In the past 10 or so years, intensive documentation activities, i.e. compilations of large, multimedia corpora of spoken endangered languages have contributed to the documentation of important linguistic and cultural aspects of dozens of languages. As laid out in Himmelmann (1998), language documentations include as their central components a collection of spoken texts from a variety of genres, recorded on video and/or audio, with time-aligned annotations consisting of transcription, translation, and also, for some data, morphological segmentation and glossing. Text collections are often complemented by elicited data, e.g. word lists, and structural descriptions such as a grammar sketch. All data are provided with metadata which serve as cataloguing devices for their accessibility in online archives. These newly available language documentation data have enormous potential.

SP02: Fieldwork and Linguistic Analysis in Indigenous Languages of the Americas

  Chapter 2. Sociopragmatic influences on the development and use of the discourse marker vet in Ixil Maya
    Abstract: In this paper we explore the functions of the particle vet in Ixil Mayan and argue that it is a discourse marker used to perform both structural and pragmatic functions. Vet serves as a structural marker indicating temporally or causally interdependent items; it also has sociopragmatic functions, allowing speakers to present an evaluation of a discourse that invites interlocutors to also take a stance both on the information presented and on their roles in particular sociocultural activities. These functions of managing negotiations among interlocutors range from agreements on descriptive terms to calls for social action among entire groups, in all cases highlighting the social nature both of discourse and of group activity. The overlapping of the structural and pragmatic functions of vet demonstrates the grammaticalization cline ranging from adverb to discourse marker proposed by Traugott (1997). Our examination of vet in a range of genres produced by the Mujeres por la Paz of Nebaj, El Quiché, Guatemala, a cooperative formed in 1997 by Ixil Maya women who were widowed or left fatherless during the Guatemalan civil war, suggest that the effects of the individual and group identities and motivations of participants outweighs anticipated genre effects.
  Chapter 6. Multiple Functions, multiple techniques: The role of methodology in a study of Zapotec determiners
    Abstract: Field linguists use a combination of techniques to compile a grammatical description, starting with various types of targeted elicitation and followed by the study of more natural speech in the form of recorded texts. These usual techniques were employed in my work on Teotitlán del Valle Zapotec, an Oto-Manguean language spoken in Mexico, but in an unusual order, with texts, mainly folk tales and life histories told by community elders, being collected and analyzed first due to the priorities of the documentation project I was a part of. This paper examines the role that methodology played in the investigation into one small area of the grammar, a set of noun phrase-final determiner clitics. These clitics make both spatial and temporal distinctions, raising theoretical questions regarding the role of a temporal marker in the NP. At the same time, it brought to light some interesting issues surrounding methodology in fieldwork: how does the method of collection affect the type of data gathered, and does the order in which different methodologies are employed affect the final outcome?
  Chapter 8. Studying Dena'ina discourse markers: Evidence from elicitation and narrative
    Abstract: This paper is concerned with discourse markers in Dena’ina Athabascan. One problem for transcribers and translators of Dena’ina texts is the great number of particles (i.e., words that cannot be inflected) that, according to speaker judgments “have no meaning” or “mean something else in every sentence.” This suggests that these particles are discourse markers, whose function is to relate discourse units to each other and to the discourse as a whole. The paper contrasts two different forms of linguistic inquiry: direct inquiry in the field, by elicitation of meaning and function of the discourse markers, and indirect inquiry, by study of a corpus of Dena’ina narratives. While elicitation is helpful in obtaining an initial gloss for the discourse markers, it is shown that only the study of texts will give us insight into the function of such particles and allows us to understand the important differences between particles that, on first sight, appear to be synonymous.
  Chapter 4. Noun class and number in Kiowa-Tanoan: Comparative-historical research and respecting speakers' rights in fieldwork
    Abstract: The Kiowa-Tanoan family is known to linguists by two characteristic features: a) a package of complex morphosyntactic structures that includes a typologically marked noun class and number marking system and b) the paucity of information available on the Tanoan languages due to cultural ideologies of secrecy. This paper explores both of these issues. It attempts to reconstruct the historical noun class-number system based on the diverging, yet obviously related, morphosemantic patterns found in each of the modern languages, a study that would be greatly benefited by fieldwork and the input of native speakers. At the same time, it reviews the language situation among the Kiowa-Tanoan-speaking communities and what some of the difficulties are in doing this kind of fieldwork in the Pueblo Southwest, touching on the myriad complex issues involving the control of information and the speech communities’ rights over their own languages as well as the outside linguist’s role in such a situation. The paper underscores these points by using only language data examples from previous field research that are already available to the public so as not to compromise native speakers’ sensitivity to new research on their languages.
  Foreword
  Chapter 7. Middles and reflexives in Yucatec Maya: Trusting speaker intuition
    Abstract: In this paper we provide a characterization of the middle construction in YM, and show that the apparently unpredictable distribution of middle voice in YM corresponds to a neatly identified, and quite limited, system of absolute events, i.e., events in which no energy is expended (Langacker 1987). This strategy is not exploited by other related Mayan languages, which tend to encode all absolute events as simple intransitive verbs. The semantic coherence of middle voice in YM is only discernible by combining analysis of narrative texts and direct elicitation with attention to speaker intuition in a variety of situational contrastive contexts guided by cognitive principles which are known to determine the behavior of middle voice systems in other languages.
  Fieldwork and Linguistic Analysis in Indigenous Languages of the Americas
  Chapter 5. The story of *ô in the Cariban Family
    Abstract: This paper argues for the reconstruction of an unrounded mid central/back vowel *ô to Proto-Cariban. Recent comparative studies of the Cariban family encounter a consistent correspondence of : o : : e, tentatively reconstructed as *o2 (considering only pronouns; Meira 2002) and *ô (considering only seven languages; Meira & Franchetto 2005). The first empirical contribution of this paper is to expand the comparative database to twenty-one modern and two extinct Cariban languages, where the robustness of the correspondence is confirmed. In ten languages, *ô merges with another vowel, either *o or *. The second empirical contribution of this paper is to more closely analyze one apparent case of attested change from *ô > o, as seen in cognate forms from Island Carib and dialectal variation in Kari’nja (Carib of Surinam). Kari’nja words borrowed into Island Carib/Garífuna show a split between rounded and unrounded back vowels: rounded back vowels are reflexes of *o and *u, unrounded back vowels reflexes of *ô and *. Our analysis of Island Carib phonology was originally developed by Douglas Taylor in the 1960s, supplemented with unpublished Garifuna data collected by Taylor in the 1950s.
  Chapter 10. Revisiting the source: Dependent verbs in Sierra Popoluca (Mixe-Zoquean)
    Abstract: Sierra Popoluca (SP) is a Mixe-Zoquean language, spoken by about 28,000 individuals in southern Veracruz, Mexico. The objectives of this paper are (1) to explore the structures of dependent verb constructions in SP and the contexts in which they occur and (2) to highlight the stages in which data is gathered and the interplay between text collection, elicitation, and analysis. SP is an ergative, polysynthetic, head-marking language. It has five dependent verb construction types. Early analyses suggested that dependent verbs were non-finite, nominalized forms. Further research indicated that the verbs are components in complex predicates that share inflection for aspect/mood, person, and number. Implicated in the analysis of these constructions are: the prosodic system; the alignment system, which is hierarchically driven with split ergativity; and the number system, also hierarchically driven. The teasing apart of the various grammatical features led to a multi-step process of analyzing and collecting data. By looking at a complex grammatical structure, this paper highlights the interdependency of corpus building, text analysis, and elicitation and the strategies used to negotiate between naturally occurring speech, in which data may be obscured by phonology, and elicited data, which frequently produces periphrastic constructions or alternative utterance types.
  Chapter 3. Classifying clitics in Sm'algyax: Approaching theory from the field
    Abstract: Sm’algyax (British Columbia and Alaska) is a highly ergative VAO/VS language with an uncommonly wide range of clitics. This chapter has the two-fold function of demonstrating how Anderson’s (2005) constraint-based analysis of clitics gives insight into the complex behavior of Sm’algyax clitics, and how the clitics themselves afford empirical means of testing such a theory. The Sm’algyax data are drawn from both field research and published texts, reflecting a community-based approach to language documentation that has evolved through a long-term, collaborative relationship with the Tsimshian (Sm’algyax) communities. Building on Stebbin’s (2003) definitions of intermediate word classes in Sm’algyax and Anderson’s Optimality Theoretical approach, we determine that in terms of their varying phonological dependence, Sm’algyax clitics include internal, phonological word, and affixal clitics. The existence of affixal clitics in Sm’algyax, however, calls into question the viability of the Strict Layer Hypothesis (Selkirk 1984) as inviolable rules when describing clitics. Furthermore, Sm’algyax provides strong evidence that the direction of clitic attachment is more clitic specific than language specific. In characterising the behaviour of Sm’algyax clitics, we find that not only does linguistic theory help sharpen our understanding of the fieldwork data, but also that field linguistics has consequences for linguistic theory.
  Chapter 9. Be careful what you throw out: Gemination and tonal feet in Weledeh Dogrib
    Abstract: The Weledeh dialect of Dogrib (T ch Yatiì) is spoken by people of the Yellowknives Dene First Nation, in and around Yellowknife, Northwest Territories. Within the formal framework of Lexical Phonology (Kiparsky 1982), this paper argues for an over-arching generalization in the phonology of Weledeh Dogrib: the constraint NoContour-Ft, which prefers (High-High) and (Low-Low) feet, but militates against (High-Low) and (Low-High) feet. NoContour-Ft is satisfied differently in different morphophonological domains: vowel deletion at the Stem Level, gemination at the Word Level, and High to Mid tone lowering at the Postlexical Level. This analysis requires that consonant length be treated as phonological in Dogrib—that is, consonant length contributes to syllable weight and mora count—even though there are no minimal pairs based on consonant length. Similarly, the distinction between High and Middle tone does not distinguish any lexical items, but is nevertheless important for the prosody of the language. Thus the paper makes a methodological point about the importance of allophonic alternations for phonological theory. Our view of what counts as contrastive or allophonic, however, is to a large extent theory-dependent; therefore, the paper also emphasizes the importance of phonetic measurements when doing fieldwork.
  Chapter 1. Introduction: The Boasian tradition and contemporary practice in linguistic fieldwork in the Americas

SP01: Documenting and Revitalizing Austronesian Languages

  Chapter 5. Local Autonomy, Local Capacity Building and Support for Minority Languages: Field Experiences from Indonesia
    Abstract: This chapter discusses the complexity of language/cultural maintenance and revival, highlighting the significance of building and supporting long-term local capacity. These complex issues are discussed in the current context of rapid political change towards greater local autonomy in Indonesia. After some background on aims and regulations of decentralization, the Balinese in Bali and Rongga in Flores are compared and discussed based on the author’s field experiences. It is argued that capacity building and support must include more than simply developing human resources. Strengthening, reforming, and/or restoring relevant institutions, particularly in relation to customary adat systems, are equally important. While a macro perspective must be adopted, priority must be given to a community- based approach and to long term capacity building and support at the most local level. The comparison of the Rongga and Balinese helps clarify how a range of inter-related socio-political and economic variables at the local and regional levels play a significant role in providing and/or inducing good conditions for bottom-up community-based initiatives in language/cultural maintenance and revival.
  Chapter 11. On Designing the Formosan Multimedia Word Dictionaries by a Participatory Process
    Abstract: Digital archiving is important work for an endangered language, because if an endangered language disappears, associated cultural assets will disappear altogether. Several digital archiving projects are being conducted in Taiwan. Many tribal teachers are now involved in these projects. Based on the needs of these tribal teachers, this chapter presents an easy-to-use system for digitally archiving Formosan Languages. The proposed approach takes advantage of the Internet and the newly launched Web 2.0 sharing platform. This chapter gives details of the development and structure of the online dictionary system. Currently, several archiving projects in Taiwan are using this system to teach tribal teachers how to develop their own language resources and online dictionaries.
  Chapter 3. Training for Language Documentation: Experiences at the School of Oriental and African Studies
    Abstract: Since 2003 the Endangered Languages Project at SOAS has been involved in various types of training for documentation of endangered languages, ranging from one-day workshops through to MA and PhD post-graduate degree programmes. The training events have been attended by specialists, research grantees, students, and members of the general public, and have covered a wide range of topics and involved delivery in a range of contexts and delivery modes, including hands-on practical sessions and e-learning in the Blackboard framework. We have covered both theory and practice of language documentation and endangered language support, including the development of multimedia and curriculum materials for language teaching, some of it experimental and, we think, quite innovative. In this chapter I discuss some of our experiences in developing and running these training workshops and courses, reporting on the models, and successes (and failures) over the past three and a half years. My goal is to share our accumulated knowledge and experience with others with similar interests, and in doing so to advance our understanding of the possibilities for language documentation training.
  Documenting and Revitalizing Austronesian Languages
  Chapter 10. WeSay, a Tool for Engaging Communities in Dictionary Building
    Abstract: This chapter introduces WeSay, an open source software application designed to involve language community members in the description and documentation of their language. Intended for rugged, low- power hardware, WeSay's simplified user interface removes many barriers that typically prevent the direct involvement of community members. In this chapter, we describe the dictionary-building features of WeSay that allow a linguist to tailor a sequence of language documentation tasks to engage community members. These tasks reduce a production step to its simplest form, enabling focused training and division of labor. Word gathering tasks use semantic domains, word lists, or patterns of likely words to build up the dictionary. Successive tasks add specific content, such as glosses and example sentences, to the entries. In addition, the program can prepare simple paper publications designed to promote community support for the effort and can transfer the raw data to the linguist for further processing with tools that are more powerful.
  Chapter 7. E-learning in Endangered Language Documentation and Revitalization
    Abstract: This chapter analyses the application of e-learning in the revitalization of endangered languages. It outlines the areas in which e-learning is efficacious, the attitudes of the indigenous language teachers to e-learning, the feelings of the Yami community toward this kind of pedagogy, and the reactions of the users, mostly young and adolescent learners of Yami. The findings are based on the results of surveys and in-depth studies in the Yami community and also on surveys made in a nation-wide seminar that enrolled teachers of the majority of the still-spoken aboriginal languages in Taiwan. Both qualitative and quantitative methods were used to gather empirical data to address questions in the following three areas: (1) the contexts of developing e- Learning materials for endangered indigenous languages in Taiwan, (2) the indigenous language teachers’ perceptions of e-Learning in Taiwan, and (3) the attitudes of the Yami community on Orchid Island toward e-Learning. This chapter provides a model for the many language revitalization projects underway in Taiwan and worldwide to take advantage of e-Learning. It also provides guidelines that enable each project to better understand the kinds of e-Learning that workto make e-Learning acceptable and efficacious.
  Chapter 8. Indigenous Language–informed Participatory Policy in Taiwan: A Socio-political Perspective
    Abstract: This chapter highlights the importance of incorporating indigenous language and its daily practice in the local context of newly transformed indigenous policy in Taiwan. Currently, the official indigenous people’s language policy is relatively confined to curriculum development and certification of indigenous peoples’ language abilities with little consideration of language practices in real socio- political situations. This chapter questions whether the revitalization of endangered indigenous languages can rely only on language policy per se. The participatory action research (PAR) methodology is employed as a main research method in inhabited Atayal communities. This chapter is divided into three main parts: firstly, a brief socio-political history of indigenous people in Taiwan is provided; secondly, two socio-political official projects related to traditional territory sovereignty are analyzed: their failure is revealed due to the neglect of indigenous language and local participation; thirdly, a case from an Atayal village, Smangus, is provided to show how indigenous languages can be revitalized through combining the villagers’ daily practices and participation. In conclusion, this chapter argues for a combining of language policy with other socio-political policies so as to create environments in which indigenous peoples can speak their own languages.
  Chapter 2. The Language Documentation and Conservation Initiative at the University of Hawai'i at Mānoa
    Abstract: Since its inception in 1963, the Department of Linguistics at the University of Hawai‘i at Mnoa (UHM) has had a special focus on Austronesian and Asian languages. It has supported and encouraged fieldwork on these languages, and it has played a major role in the development of vernacular language education programs in Micronesia and elsewhere. In 2003, the department renewed and intensified its commitment to such work through what I shall refer to in this chapter as the Language Documentation and Conservation Initiative (LDCI). The LDCI has three major objectives. The first is to provide high- quality training to graduate students who wish to undertake the essential task of documenting the many underdocumented and endangered languages of Asia and the Pacific. The second is to promote collaborative research efforts among linguists, native speakers of endangered and underdocumented languages, and other interested parties. The third is to facilitate the free and open exchange of ideas among all those working in this field. In this chapter, I discuss each of these three objectives and the activities being conducted at UHM in support of them.
  Chapter 12. Annotating Texts for Language Documentation with Discourse Profiler's Metatagging System
    Abstract: This chapter introduces a systematic and robust way to annotate (or ‘tag’) texts with discourse information. To date there has not been a method for annotating texts for language documentation with discourse-text information. This is the first paper to systematically describe the capabilities and the annotating methodology of the Discourse Profiler’s metatagging system as a means of annotating endangered languages’ texts in a Toolbox database. Since there is a division of labor between Toolbox and Discourse Profiler, the Toolbox database can be the basis for the archival tasks, whereas the Discourse Profiler software is a computer assisted discourse-text analytical tool that mines the Toolbox discourse-text annotated database in order to produce two primary capabilities: 1) to create a representative interactive compressed representation or ‘map’ of the structure and elements of a text, and 2) to quantify texts based on this special metatagging system with an array of sixteen different possible statistical outputs (including both referential distance and topic persistence statistics). Although the main focus of this chapter is on the multipurpose annotation system, I will introduce the basics of the Discourse Profiler software in order to illustrate the range of analytical possibilities that this annotation system incorporates.
  Chapter 9. Teaching and Learning an Endangered Austronesian Language in Taiwan
    Abstract: This chapter provides a case study of the process of endangered language acquisition, which has not been well studied from the viewpoint of applied linguistics. It describes the context of teaching Chinese adult learners in Taiwan an endangered indigenous language, the teachers’ pedagogical approaches, the phonological and syntactic acquisition processes the learners were undergoing, and applications to other language documentation and revitalization programs. Both qualitative and quantitative methods were used to address the research questions. This study demonstrates cogently that language is a complex adaptive system. In phonological acquisition, the trill was the most difficult phoneme to learn. Systematic variations for the variables () and (s) were found to be constrained by both markedness and interference. Furthermore, learners also tended to interpret Yami orthography based on their knowledge of English. In word order acquisition, learners performed much better than expected, partially because the present tense, coded by the SV word order, is the norm in Yami conversations. However, students still inaccurately associated word order with sentence type rather than with tense distinction. The Yami case provides an integrated model for endangered language documentation, revitalization and pedagogical research, which would be of interest to people working with other languages and the language documentation field in general.
  Chapter 6. Documenting and Revitalizing Kavalan
    Abstract: The purpose of this chapter is to provide a two-dimensional approach to language documentation (Hi mmelman 1998). In addition to building a database, we also conducted a sociolinguistic survey des igned to document the state of health of a language in a particular spatio-temporal frame. Our goa l is to share our fieldwork experience of documenting Kavalan, a seriously endangered language in sou theastern Taiwan now spoken by fewer than just a few dozen speakers. We first discuss our field exp eriences in working with speakers of Kavalan in Sinshe village, the only significant Kavalan set tlement left in Taiwan, and the state of the Kavalan language, based in part on Huang and Cha ng’s (19 95) earlier sociolinguistic survey, and in part on a recent more in-depth village-wide survey of lan guage use in the community. Next, we introduce the NTU Corpus of Formosan Languages, part of which incorporates our corpus data in Kavalan. The NTU Corpus of Formosan Languages aims to establish a standard for the creation of linguistic corpus databases through the application of information technology to linguistic research. The creation of this linguistic database enables us both to preserve valuable linguistic data and to provide a systematic recording of these languages, for the benefit of future linguistic research.
  Chapter 4. SIL International and Endangered Austronesian Languages
    Abstract: SIL International has been partnering with Austronesian language communities in language development for over fifty years. This chapter briefly reviews that history, situates it in the current environment of international concern for the documentation and revitalization of endangered languages, and looks at ways in which SIL might assist endangered Austronesian language communities of today. Two aspects of language development are considered—one more “academic” in nature, focusing on products primarily of interest to linguists and other researchers; the other more “development” in nature, focusing on language resources and competencies of greater interest and relevance to language communities. The chapter summarizes some recent studies related to language endangerment/vitality, and considers how language development relates to language revitalization and documentary linguistics. SIL can continue to learn from and link with others in describing and documenting endangered Austronesian languages, in providing consulting and training at the request of language communities and others, and in designing and developing affordable language software to help accomplish related tasks.
  Chapter 1. Introduction: Documenting and Revitalizing Austronesian Languages
    Abstract: This chapter provides an overview of the issues and themes which emerge throughout this book. It begins with a brief description of language revitalization activities which are taking place in the Pazeh, Kahabu and Thao aboriginal communities in the mountains and plains of Taiwan. The activities of elders in these communities exemplify the growth of language activism. These case studies lead to a discussion of changes in the field of linguistics and the alliances which are being built between linguists and community language activists. The 11 chapters in the book are then reviewed within the key themes of international capacity building initiatives, documentation and revitalization activities, and computational methods and tools for language documentation.