SP09: Language Documentation and Conservation in Europe


Edited by Vera Ferreira and Peter Bouda

University of Hawai‘i Press
ISBN-13: 978-0-9856211-5-5




 Europe is a continent with low linguistic diversity and the number of minority and endangered languages is reduced in comparison to other parts of the world. Consequently, Europe is not in the focus of the researchers working on language documentation. Apart from some “major” minority languages in Europe (Catalan, Galician, Breton, Welsh, Basque, etc.), several of the European endangered languages are not known in detail (even in the academia) or documented in a concise and comprehensive way. Primary data on these languages, reflecting their everyday use, is almost non-existent. Moreover, the linguistic diversity in Europe is also unknown to the general public.
In this sense and in order to raise awareness of minority and endangered languages in Europe and to foster the dialog between researchers working on European endangered languages and on language documentation all over the world, CIDLeS – Interdisciplinary Centre for Social and Language Documentation (http://www.cidles.eu/) organized in October 2013 a two-day conference titled Endangered Languages in Europe (ELE 2013). ELE 2013 aimed to provide an interdisciplinary forum in which scholars from language documentation, language technology, and experts on European endangered languages could
exchange ideas and techniques on language documentation, archiving, and revitalization; to further methodological discussions and collaborative research into linguistic diversity in Europe; and to reflect on language policy issues.


  • Brief considerations about language policy: An European assessment (Paulo Carvalho Vicente; Francisco Carvalho Vicente)
    Abstract: The rising of language policy worldwide is a consequence of a globalized world and the openness of borders. Even countries with a relative cultural homogeneity face nowadays new challenges regarding massive migration fluxes and the results of growing awareness for endangered languages and cultures, notably in Europe. This is being noticed around the Old Continent where diversity proves to be a distinct value since ever. In this paper we reflect on the scope of cultural identity and multilingualism to shed new light on language policy and consequently refresh our understanding of a key policy, which is already a decisive public policy for the European peoples.
  • Bridging divides: A proposal for integrating the teaching, research and revitalization of Nahuatl (Justyna Olko; John Sullivan)
    Abstract: This paper discusses major historical, cultural, linguistic, social and institutional factors contributing to the shift and endangerment of the Nahuatl language in Mexico. As a practical proposal, we discuss our strategy for its revitalization, as well as a series of projects and activities we have been carrying out for the last several years. Crucial to this approach are several complementary elements: interdisciplinary research, including documentary work, as well as investigation of both the historical and the present state of Nahua language and culture; integration of both Western and native-speaking indigenous researchers as equal partners and the provision of space for indigenous methodologies; creation of teaching programs for native and non-native speakers oriented toward the preparation of language materials; and close collaboration with indigenous communities in developing community-based programs. The operability of this strategy will depend greatly on our ability to foster collaboration across academic, social, and ideological boundaries, to integrate theory, methodology and program implementation, and to efficiently combine grass- roots and top-down approaches. An important aim is to restore the culture of literacy in Nahuatl through our monolingual Totlahtol series, publishing works from all variants of the language and encompassing all genres of writing. We also strive to strengthen the historical and cultural identity of native speakers by facilitating their access to the alphabetical texts written by their ancestors during the colonial era.
  • The first Mirandese text-to-speech system  (José Pedro Ferreira; Cristiano Chesi; Daan Baldewijns; Daniela Braga; Miguel Dias; Margarita Correia)
    Abstract: This paper describes the creation of base NLP resources and tools for an under- resourced minority language spoken in Portugal, Mirandese, in the context of the generation of a text-to-speech system, a collaborative citizenship project between Microsoft, ILTEC, and ALM – Associaçon de la Lhéngua Mirandesa. Development efforts encompassed the compilation of a large textual corpus, definition of a complete phone-set, development of a tokenizer, inflector, TN and GTP modules, and creation of a large phonetic lexicon with syllable segmentation, stress mark-up, and POS. The TTS system will provide an open access web interface freely available to the community, along with the other resources. We took advantage of mature tools, resources, and processes al- ready available for phylogenetically-close languages, allowing us to cut development time and resources to a great extent, a solution that can be viable for other lesser-spoken languages which enjoy a similar situation.
  • BaTelÒc: A text base for the Occitan language (Myriam Bras; Marianne Vergez-Couret)
    Abstract: Language Documentation, as defined by Himmelmann (2006), aims at compiling and preserving linguistic data for studies in linguistics, literature, history, ethnology, sociology. This initiative is vital for endangered languages such as Occitan, a romance language spoken in southern France and in several valleys of Spain and Italy. The documentation of a language concerns all its modalities, covering spoken and written language, various registers and so on. Nowadays, Occitan documentation mostly consists of data from linguistic atlases, virtual libraries from the modern to the contemporary period, and text bases for the Middle Ages. BaTelÒc is a text base for modern and contemporary periods. With the aim of creating a wide coverage of text collections, BaTelÒc gathers not only written literary texts (prose, drama and poetry) but also other genres such as technical texts and newspapers. Enough material is already available to foresee a text base of hundreds of millions of words. BaTelÒc not only aims at documenting Occitan, it is also designed to provide tools to explore texts (different criteria for corpus selection, concordance tools and more complex enquiries with regular expressions). As for linguistic analysis, the second step is to enrich the corpora with annotations. Natural Language Processing of endangered languages such as Occitan is very challenging. It is not possible to transpose existing models for resource-rich languages directly, partly because of the spelling, dialectal variations, and lack of standardization. With BaTelÒc we aim at providing corpora and lexicons for the development of basic natural language processing tools, namely OCR and a Part-of-Speech tagger based on tools initially designed for machine translation and which take variation into account.
  • Language Landscape: Supporting community-led language documentation(Sandy Ritchie; Samantha Goodchild; Ebany Dohle)
    Abstract: Different groups have differing motivations for participating in language documentation projects. Linguists want to increase our knowledge of languages and linguistic theory, but constraints on their work may lead to issues with their documentation projects, including their representations of the languages they study. Native speakers participate to maintain and develop their language, and may choose to represent it in a way which showcases their culture and attitudes. In order to encourage more native speakers to take part in documentation projects, a simple integrated system is required which will enable them to record, annotate and publish recordings. Language Landscape, our web-based application, enables native speakers to publish their recordings, and Aikuma, a mobile application for documentation, enables them to record and orally translate recordings, in both cases with minimal cost and training required. Language Landscape benefits communities by allowing them to document their language as they see fit, as demonstrated by our outreach program, through which some London school children created their own projects to document their own languages and those spoken around them.
  • Reflections of an observant linguist regarding the orthography of A Fala de Us Tres Lugaris (Miroslav Valeš)
    Abstract: A Fala has never had a standardized orthography as it is a language of oral tradition and almost all written documents have always been produced only in Spanish. The few documents which exist in A Fala use orthographies that vary considerably, especially when indicating the phonemes which are absent in standard Spanish. However, in the past decades there have been signs of an increasing interest regarding the language and cultural identity in the three villages and there have also been attempts to establish organizations to promote the language, such as A Fala y Cultura, U Lagartu Verdi, and A Nosa Fala. This increase in language awareness leads inevitably to situations, when the speakers want to express their linguistic identity in written form and the lack of written standard makes this task rather difficult. The objective of this paper is to analyze the public inscriptions, direction signs and street names written in A Fala. The appearance of these signs expresses the willingness of the speakers of A Fala to claim their linguistic identity. At the same time, their inconsistent orthography reveals the problems that arise in the course of writing their language. There are two main causes of these difficulties: The influence of Spanish, as all the speakers are bilingual in Spanish, and variation within the language itself. Regarding the first cause, the main issues include the uncertainty how to write the phonemes that do not exist in standard Spanish, and also whether the phonemes that do exist in Spanish should be written in the same way or not. In respect of the second cause, the signposts and street names reflect the three main varieties: Valverdeñu, Lagarteiru and Mañegu. They also partially reflect the ideas of those who created them and testify to a certain evolution in time. In general, the linguistic data in the form of street names and direction signs provide relevant information about the options for writing those phonemes which do not have an equivalent in Spanish, as well as geographical (diatopic) variation, and the changes of ideas regarding the orthography. This paper will use this valuable linguistic material to reflect on the issues that are involved in the establishment of an orthographical standard.
  • Multilingualism and structural borrowing in Arbanasi Albanian (Jana Willer-Gold; Tena Gnjatović; Daniela Katunar; Ranko Matasović)
    Abstract: In this paper we present a brief overview of the history of linguistic contacts of Arbanasi Albanian, a Gheg Albanian dialect spoken in Croatia, with Croatian and Italian. Then we discuss a number of contact-induced changes in that language. We show that Arbanasi Albanian was subject to strong influences from Croatian (and, to a lesser extent, from Italian) on all levels of linguistic structure. Using the data from our own fieldwork, we were able to show that there were also influences on the level of syntax, including the borrowing of certain constructions, such as analytic causative and imperative constructions, as well as the extension of the use of infinitive in subordinate clauses.
  • El árabe ceutí, una lengua minorizada. Propuestas para su enseñanza en la escuela (Francisco Moscoso García)
    Abstract: The Arabic of Ceuta is the native language of 40% of the Spanish population of Ceuta, which also speaks Spanish. The remainder 60% is mostly monolingual and their native language is Spanish. There is also 1% of bilingual citizens whose native tongue is Sindhi. The Arabic of Ceuta is Moroccan Arabic, the native language of 60% of the population of the neighboring country and, specifically, it shares common features with the northern dialect area (Yebala region and the Atlantic coast down to the city of Larache). But its use in Spanish territory since the second half of 19th century gave rise to two phenomena: Spanish borrowings and code-switching in the case of bilingual speakers. The Arabic of Ceuta is an oral language, like Moroccan Arabic, which has never been standardized from the political sphere, in contrast with literal Arabic (also called cultivated, standard, modern or classic), which is not the native language of any Arab in the world and has emerged as the only means of educational, political, and cultural expression due to political and religious power. Despite this, there is a whole literary tradition, oral and written, in Moroccan Arabic, especially from the 20th century. Currently, there is a group of Moroccan professors and intellectuals working on its coding in order to generalize a writing system in Arabic script. Ceuta is the Spanish region with the highest school dropout rate in Spain, and this is particularly acute in schools where the majority of students are bilingual. Many experts recommend teachers and professors to teach in the native language of their pupils, at least at the beginning of their education. In this paper we will put forward some proposals for the recognition of Ceuta Arabic as coded by the movement of Moroccan intellectuals who are already working on the development of a dictionary, a grammar, text collections, and translations of works from the European literature to Moroccan Arabic. The ultimate goal should be its inclusion in the educational and administrative services of the city as well as to achieve an official status in the future, rightly recognized by the Spanish Constitution.
  • Language Revitalization: The case of Judeo-Spanish varieties in Macedonia(Esther Zarghooni-Hoffmann)
    Abstract: Judeo-Spanish is a secondary dialect of the Spanish language having evolved from the ancient standard Spanish in the course of its expansion southwards. Although the language enjoys a heritage and presence in the Balkans of over five centuries, it is now facing language death – its acuteness depending on the region. In Macedonia,1 the two varieties of Bitola and Skopje last documented by Kolonomos (1962) need to be labelled “moribund” or “nearly extinct”. This paper aims to point out some of the aspects relevant to the author’s doctoral research study, in which a documentation of the current language status of Judeo-Spanish in Macedonia is envisaged. The deliberations look at the reasons for language endangerment and at the same time evaluate possibilities and opportunities for language revitalization – what priorities are to be set, what role do linguists and especially the community play, what is the approach, what are skills, methods, and steps to be taken into consideration to ensure not only a documentation of the language, but also and foremost its conservation and revitalization.
  • The sociolinguistic evaluation and recording of the dying Kursenieku language (Dalia Kiseliūnaitė)
    Abstract: Since the times of the Teutonic order until 1923, the Curonian Peninsula was a part of Prussia, and later – a part of Germany. Baltic tribes’ migration processes of different intensity occurred here. In the 16th century the newcomers from Latvian speaking Courland started to dominate, moving to the spit in several waves up to the 18th century; at the same time, people from the continental part (the majority of them were Germanized Prussians), colonizers from other German lands, and Lithuanians from the Klaipeda area settled in the region. The Kursenieku language, also known as New Curonian (German Nehrungskurisch) can be categorized as a mixture of Latvian Curonian dialects with Lithuanian, German, and elements of the now extinct Old Prussian. Since it had no written form, Kursenieku was roofed by Lithuanian and later by German, which had functioned as languages of religion and education for a long time. The community disintegrated at the end of World War II. After the Kursenieki community left their homeland and settled in different towns and villages of Germany, there was no practical use for the maintenance of Kursenieku. The chronological reconstruction of the Kursenieku is possible and useful for the Baltic studies; however, there is no motive for revitalization: nowadays, there is no community willing to use this language. This article briefly presents the development of the Kursenieku language in its ethnocultural context. Moreover, it raises the discussion around its status (variety or language), provides its sociolinguistic characteristics, describes the work that has been done with the language, and presents urgent goals and research perspectives.
  • Identity and language shift among Vlashki/Zheyanski speakers in Croatia (Zvjezdana Vrzić; John Victor Singler)
    Abstract: The language Vlashki/Zheyanski, spoken in two areas – the Šušnjevica area and Žejane – of the multilingual, multiethnic Istrian peninsula of Croatia, evinces strong loyalty on the part of its elderly speakers, yet in both areas a language shift to Croatian is well underway. Vlashki/Zheyanski is a severely endangered Eastern Romance language known in the linguistic literature as Istro-Romanian. In order to study the domains and frequency of use of the language and equally to examine speaker attitudes about language and identity, we administered a questionnaire to speakers in both locations. Our sample included responses from individuals in four age groups. Our discussion here focuses on 16 men and women from the two older groups, 51–70 and 71-and- older. In Žejane, speakers saw knowledge of the language and family lineage as defining components of being a “real” member of the community. The name for the language, Zheyanski, comes from the village name. Hence, someone who speaks the language asserts that village belonging and village affiliation are at the core of speakers’ identity. In terms of national identification, whether Croatian, Italian, and/or Istrian, Zheyanski speakers by and large showed little enthusiasm for any of the three choices. In terms of language use, all respondents continue to use the language on a daily basis but report that they speak mostly Croatian to their grandchildren. In the Šušnjevica area, people used the same criteria, language knowledge and family lineage, to define group membership and feel close affiliation to their home village. Unlike in Žejane, the name of the language, “Vlashki”, does not correspond to a unitary group name accepted and liked by all. In terms of larger identity, villagers embraced identities that they share with their Croatian-speaking neighbors: Most felt “extremely Istrian”, and at least “fairly Croatian”. The language shift to Croatian is also more advanced here: All the speakers report speaking mostly Croatian to their children. While speakers in both Žejane and the Šušnjevica area endued their language with a critical role in their identity, this attitude toward Vlashki/Zheyanski does not manifest itself in their communication with younger generations where other social forces have caused the shift to the use of Croatian.
  • Kormakiti Arabic: A study of language decay and language death (Ozan Gulle)
    Abstract: Kormakiti Arabic (also called Cypriot Maronite Arabic) is a language with approximately 150–200 speakers in Kormakitis, a village north-western Cyprus. Kormakiti Arabic is highly endangered, not only due to its low number of speakers but more importantly because younger Maronites with their roots in Kormakitis do not acquire Kormakiti Arabic naturally any more. Kormakitis itself is almost only inhabited by elderly Maronites who lived there before the separation of Cyprus in 1974. This paper is on language death and language decay of Kormakiti Arabic. Several historical sources are used in order to illustrate the historical and socio-linguistic environment this language survived until today. The linguistic evidence is then compared with the theory of Gaelic-Arvanitika-Model Sasse (1992a) in order to show parallels, as well as the differences between Arvanitika and Kormakiti Arabic.
  • New speakers of Minderico: Dynamics and tensions in the revitalization process (Vera Ferreira)
    Abstract: From the sixteenth century on, the blankets of Minde, a small village in the center of Portugal, became famous all over the country. The wool combers, blanket producers, and traders of Minde began to use Minderico in order to protect their business from “intruders”. Later, this secret language extended to all social and professional groups and became the main means of communication in the village. During this process, Minderico turned into a full-fledged language with a very characteristic intonation and a complex morphosyntax, differentiating itself from Portuguese. However, the number of speakers declined drastically during the last 50 years. Minderico is now actively spoken by 150 speakers, but only 23 of them are fluent speakers. More than half of the fluent speakers are new speakers of the language. New speakerness is a relatively new phenomenon in the Minderico speaking community and a direct result of the revitalization process which was initiated in 2009. This paper examines the role of the new speakers in the revitalization of Minderico, considering issues of authenticity and socio-linguistic legitimacy.
  • Lemko linguistic identity: Contested pluralities (Michael Hornsby)
    Abstract: In their efforts to organize as a recognized minority within the Polish state, the Lemkos have faced a number of obstacles, both internal and external to the community. This article explores three aspects of self-representation of the Lemko community – group membership, victimhood and “speakerhood” – and examines how these representations are contested on a number of levels.
  • Authenticity and linguistic variety among new speakers of Basque (Jacqueline Urla; Estibaliz Amorrortu; Ane Ortega; Jone Goirigolzarri; Belen Uranga)Abstract: This paper argues that the type of variety learned and used by Basque language learners is a key element in their self-perception as “true” or authentic speakers of Basque. Drawing on focus groups and individual interviews, we find that new speakers are for the most part strongly oriented towards the value of authenticity epitomized by local varieties. While new speakers report the utility of their mastery over the new standard Basque variety, they are not inclined to view this mastery as granting themselves greater authority or ownership over Basque. Rather they strongly valorize the informal and vernacular speech forms indexing colloquial speech and local dialect most identified with native speakers. The new speaker’s sociolinguistic context and motivations for learning Basque seem to be predictive of the strength of this orientation. The findings of this study point to the necessity of further study and documentation of local vernacular as well the urgency for language educators to find ways of incorporating the acquisition of local and dialectal features into language instruction.