Language Documentation

By some estimates, there are roughly 6,900 languages currently spoken on earth, approximately half of which are spoken in the countries of Asia and the Pacific, more than 700 in Indonesia alone for example ( If the minimum that is needed for any kind of effective language learning or teaching is a dictionary and a reference grammar, then most of the world's languages must be called "undocumented," or at the very least under-documented. To take the Philippines as one example, Filipino (also known as Tagalog), Cebuano (also known as Visayan or Bisayan), and Ilokano are well documented, but another half dozen languages can claim between 300,000 and one million speakers each but are under-documented by any standard. These include Maguindanao, Maranao, Masbatenyo, Yakan, and Tausug (there are no entries for any of these in the language materials database maintained by UCLA), the last two of which are spoken in areas where the Abu Sayyaf terrorist organization operated for years, in addition to parts of Malaysia and Indonesia. Other languages that have been critically important to the US in recent years and which were under-documented at the time they became important for security reasons include Somali and the languages of Afghanistan. Language documentation is of course also an issue for scholars who need to conduct research in countries where often only the national language is well documented, as well as for the NRCs that support such scholars.

Scores of dictionaries, grammars, and textbooks for languages taught at UH have been produced through research at the UH Department of Linguistics, which has a special focus on the languages of the vast Austronesian family. UH offers a language documentation specialization or track within the MA in Linguistics, and a high percentage of its doctoral students engage in language documentation work in Asia or the Pacific region. In late 2003, graduate students in the department initiated a project to train native speakers of undocumented or under-documented languages to document their own languages. Participants in the program spend a term attending classes on language documentation techniques and issues and are paired with graduate students with whom they design and carry out projects. Two years later, two dozen previously under-documented languages of the world have received much needed attention, and the effort has seen the winning of three awards: the Jacob Peace Memorial Award, the NAFSA "Partnership in Excellence Award," and first prize in the 2005 Small Business Plan Competition organized by the UH College of Business. The languages that have been included in the language documentation project to date are the following: Balinese, Cham, Chuukese, Ema, Fataluku, Ilokano, Javanese, Kalmyk, Kemak (Russia), Keres, Kerinci, Konkani, Lamaholot, Lungtu, Makasae Fatumaka, Makasae Osoroa, Minangkabau, Lirat (Xinjiang, China), Okinawan, Pingilapese, Selayar, Tibetan (Lhasa), Tibetan (Tsetang), Tiwa, Truku, Waima'a (

The language documentation project has been supported since its inception by the NFLRC and the UH NRCs for Southeast Asia, East Asia, and the Pacific Islands, primarily by providing modest stipends to the native speaking participants who are the focus of the project. In addition, one set of instructional materials (for Pingilapese, a Pacific island language) has been disseminated by the NFLRC and another for Kemak (East Timor) is in preparation. In preparation for the pending grant cycle 2006-2010, all of the Title VI centers have been meeting regularly to plan an articulated effort to increase its impact nationally and internationally. The three UH NRCs will each concentrate their support on the languages of their region by continuing to provide stipends to student informants. The NFLRC will take the lead in supporting a major effort to enhance the capacity for language documentation.  The major steps that are planned are as follows:

  • Fall 2005: Establishment of a UH Language Documentation and Conservation Advisory Council, consisting of faculty, students, and department chairs of Linguistics and Second Language Studies, a member of the UH Board of Regents, and the directors of the NFLRC and the three NRCs at the university.
  • Spring 2006: Host a small working conference to survey documentation activities at key institutions, make plans to establish an e-journal for language documentation, plan an international conference (2008) and workshop (2010), and possibly to create an international association that will act as the sponsor of subsequent activities.
  • 2006-07: Launch an online refereed journal for language documentation, to be supported by the NFLRC, that will deal with such topics as the goals of documentary linguistics, assessing ethnolinguistic vitality, problems of data collection, orthography design, reference grammar design, lexicography, literacy, archiving, and ethical issues.
  •  2007-2008:  Host an international conference on language documentation.
  • 2009-2010: Host a summer institute to disseminate what has been learned to individuals and teams from other institutions in the US. 

In addition to the UH NRCs, our partners in this project include the University of Minnesota LRC and the Indiana University LRC.  Minnesota has agreed to send Louis Janus, who directs their LCTL project, to participate in planning meetings, organize a colloquium on “pedagogical issues in learning and teaching un(der)-documented languages” as part of the 2008 international conference, and to offer a course or course-module on teaching LCTLs in the 2010 Summer Institute.  Indiana will send one or more persons to participate in the 2008 conference and will share information from its own research projects on some of the less documented languages of Central Asia.

Tagged as 2006-2010