Summary: Readily available collections of written or spoken language and easy-to-use analytical tools offer exciting possibilities for using the techniques of corpus linguistics to improve language teaching and learning. This project focuses on Korean, for which there has been explosive growth in corpora and software but no prior work on pedagogical corpus linguistics. Products will include a volume of studies applying corpus linguistics to problems in Korean language teaching and learning and a technical report on the uses of language corpora in teaching foreign languages in general.
Corpus linguistics studies take advantage of the existence of large collections of language production (written or spoken language) in order to investigate a language. It bases its descriptions on the empirical characteristics of language production (rather than chiefly on theory, or speaker intuition). The past decade, and especially the past several years, have seen an explosion in corpus linguistic studies. This is due to several causes: first, personal computers now have the speed and storage capacity to process huge corpora (often involving tens or hundreds of millions of words -- the equivalent of hundreds or thousands of thousand-page books) in a few seconds. If one wants to find out how a word is used, for example, one can pull up hundreds of examples, in context, in a matter of seconds in a convenient display using readily available and inexpensive tools. Second, there now exist easily accessible and scientifically prepared collections of language -- large and well-structured corpora -- which the individual can easily use on a personal computer. Third, the World Wide Web itself now contains an enormous amount of language, again readily accessible to the individual user. The Web has also made the distribution of scientific corpora and corpus tools easy and convenient, as well as provided a forum for corpus linguists to interact -- thus driving the field forward. Fourth, the field of natural language processing by computer (and artificial intelligence in general) has been exploring the ways in which probabilistic models can improve processing: these probabilistic models require tools that investigate the statistical structure of language output, and this of course involves corpus studies.
Foreign language pedagogy is now beginning to see new possibilities for recent advances in corpus linguistics to improve language teaching and learning. To be sure, the results of older "classic" corpus-based studies of word frequency have long been of interest to language teaching, informing decisions about materials development, grading of materials, and assessment. These early studies, from the period before the mid-1990s, were produced by specialists working with mainframe computers at major universities and research institutes. Certainly, corpus linguistics was nothing that an ordinary classroom teacher or learner could possibly do. What we are seeing now is something quite different and potentially revolutionary. Readily available corpora and easy-to-use tools can now be used on the spot in a language teaching context, by teachers and learners without extensive training in computational linguistics, and studies of linguistic features can be tailored to specific pedagogic context and learning requirements. Thus corpus linguistics fits in with the current emphasis on authentic materials and on task-based language teaching -- emphases of other Hawai‘i NFLRC projects.
This NFLRC project, to be accomplished during the first two years of the grant cycle, will have limited aims. We will not attempt to develop either corpora or corpus-analytic tools. Both of these are expensive and time-consuming endeavors which would require substantial funding from additional sources (however, such corpora and corpus tools are being developed at a rapid rate for many languages). Rather, the goal of this project is to demonstrate the potential of corpus based studies for foreign language pedagogy. This will be accomplished through three primary foci:
The project will concentrate on Korean as the primary demonstration language, chosen for several reasons: the presence of large numbers of advanced graduate students, many experienced teachers, and specialists interested in corpus studies of Korean at UH, the fact that Korean is a language of great and growing national importance, and the fact there is now the beginning of what we confidently expect will be an explosive growth in available corpora and software. Finally, there is virtually no existing work in pedagogical corpus linguistics yet for Korean. In the second year of the project, some attention will also be given to projects in Japanese, for which all of the same arguments can be made. In addition, the NFLRC will provide some assistance in the form of lecturer release time to Professor Yuphaphann Hoonchamlong, Professor of Thai, who is experienced in Thai corpus linguistics and is seeking external funding for a project to create a corpus of spoken Thai that will be useful for Thai language pedagogy.