Reading in a Foreign Language
Volume 15, Number 2, October 2003
Technical vocabulary in specialised texts
Teresa Mihwa Chung
Victoria University of Wellington
This article describes two studies of technical vocabulary, one using an anatomy text and the other an applied linguistics text. Technical vocabulary was found by rating the words in the texts on a four step scale. It was found that technical vocabulary made up a very substantial proportion of both the different words and the running words in the texts, with one in every three running words in the anatomy text, and one in every five in the applied linguistics text being a technical word. A considerable number of technical words were from the first 2000 words of English and the Academic Word List. The article ends with suggestions for helping learners notice and learn technical vocabulary.
There are several approaches taken to the identification of technical vocabulary. One approach is to use the intuition of a subject expert. This can be done in three ways, by using a rating scale as used in this study (Baker, 1988; Farrell, 1990), by using a technical dictionary compiled by a subject specialist or group of specialists (Nation, 2001: 201; Oh et al., 2000), and by making use of clues that the most relevant specialist, the actual writer of the text, used to mark the words considered to be important for the message of the text, since when new terms are introduced in a text, the writers deliberately provide contextual clues to help readers manage new terminology (Bramki and Williams, 1984; Flowerdew, 1992; Williams, 1981). Another way is to use a corpus-comparison approach by comparing word frequencies in a technical text with those in a different corpus (Becka, 1972; Yang, 1986; Baker, 1988; Farrell, 1990; Sutarsyah et al., 1994). Technical words should be much more frequent in the technical corpus.
Research on technical vocabulary (Sutarsyah, Nation and Kennedy, 1994; Chung, 2003; Chung and Nation, forthcoming) has shown a significant underestimation of the role played by technical vocabulary in specialised texts and a lack of information about how technical vocabulary relates to other types of vocabulary. This article reports on the significance of this research for language learners and teachers.
One description of the various levels of vocabulary with the goal of designing the vocabulary component of a language course (Nation, 2001) divides vocabulary into four levels: high frequency words; academic vocabulary; technical vocabulary; and low frequency words. High frequency words are the most frequent 2,000 words of English. West (1953) called these words a general service vocabulary because they were of use (or service) no matter what the language was being used to do. This vocabulary typically covers around 80% of the running words of academic texts and newspapers, and around 90% of conversation and novels. It includes virtually all of the function words of English (around 176 word families), but by far the majority of high frequency words are content words (Nation, 2001: 13-16). For learners with academic goals, the 570 word family Academic Word List (Coxhead, 2000) is like a specialised extension of the high frequency words. It covers on average 8.5% of academic text, 4% of newspapers and less than 2% of the running words of novels. This vocabulary has been called academic vocabulary (Martin, 1976), sub-technical vocabulary (Cowan, 1974) or semi-technical vocabulary (Farrell, 1990). There has been a lot of discussion and some research on academic vocabulary (Nation and Coxhead, 2001). This vocabulary is common to a wide range of academic fields but is not what is known as high frequency vocabulary and is not technical in that it is not typically associated with just one field. It is however more closely related to high frequency vocabulary than to technical vocabulary. It was thought that the third level of vocabulary, technical words, covered about 5% of the running words in specialised texts, and was made up of words that occurred frequently in a specialised text or subject area but did not occur or were of very low frequency in other fields (Nation, 2001: 18-19). Technical vocabulary is largely of interest and use to people working in a specialised field. The fourth level of vocabulary consists of all the remaining words of English, the low frequency words. There are thousands of these words (Goulden, Nation and Read, 1990) and they typically cover around 5% of the running words in texts.
While there is considerable research evidence about the nature and coverage of high frequency and academic words, there has been little investigation of technical vocabulary and low frequency words. One of the reasons for this is that there has been little agreement about what technical vocabulary is and about how to count it reliably. This article describes a study of the technical vocabulary in an anatomy text and the technical vocabulary in an applied linguistics text. It provides at least initial answers to the questions:
Words were classified as being technical or non-technical words by rating them on a four point scale designed to measure the strength of the relationship of a word to a particular specialised field. The scale used to do this is in Table 1, which has the scale for the anatomy vocabulary. The applied linguistics scale was essentially the same with different examples and suggested areas. Items classified at steps 3 and 4 were considered to be technical words. Items at steps 1 and 2 were not.
Words at Step 3 may have polysemes that occur in general use, and in some cases occur in general use with little change in meaning, for example breathe and bony. Step 4 includes words like thorax and mammary which may be known in other fields but which have a technical flavour. Even though they are used outside anatomy they could be thought of as being anatomical terms.
The inter-rater reliability check
To make sure that the scale could be applied consistently by other researchers and was applied consistently in the present research, an inter-rater reliability check was carried out. The raters' task in the inter-rater reliability check was to assess the degree of specificity of the meaning of the words in the text to the field of anatomy. Sixty words (fifteen from each of the four steps) were randomly chosen to be used from one section of the text. The researcher had already classified these items using the scale.
The rater in the main inter-rater reliability check was also a qualified and experienced ESOL teacher who is a native speaker of English. The rater's task was to assign the test words to the four steps of the scale depending on the degree of relationship of the meaning to the field of anatomy.
The researcher explained the objectives of the study, the aim of the reliability check, and how to consider the semantic relationship in order to place the words in the four-point scale. At the training stage, the rater was provided with the text in which the words to be rated were already marked. Forty words (ten from each of the four steps as classified by the researcher) were randomly chosen to be used. The researcher and the rater went through the words one by one together. Each time the rater's results were compared with those of the researcher. When discrepancies were found, they were discussed by the researcher and the rater, and all were resolved. The training session took about 35 minutes.
Then, sixty randomly selected words, fifteen from each of the four steps, were provided for the rater to analyze independently. The rater used the same text as the one used at the training stage, but with different words marked. This number of words (15) at each step is much greater than the minimum of three needed to establish rating accuracy from four groups at the 0.05 level of significance (Rosenthal, 1987: 64).
The reliability accuracy score was used to estimate the degree of agreement between the researcher's results and the rater's. The degree of agreement of rating at each step of the rating scale was compared to find any tendencies of bias at particular steps. Rosenthal (1987: 67) states that a raw accuracy score of 0.7 is desirable for rating items in four groups.
Table 2: Inter-rater reliability accuracy score calculated by the number of words assigned
In Table 2 above, we see that of the 15 words assigned to each of the four steps by the researcher, the rater agreed on the assignment of 13 items out of 15 at Step 1 and 14 out of 15 words at Step 2. At Steps 3 and 4, the rater agreed on the assignment of all 15 of these items. Thus, the total agreement score is (13+14+15+15=) 57 out of 60.
The total scores 16 in the second row and the fourth row of the last column of Table 2 indicate that the rater has a slight bias for rating words towards the more technical end of the scale. Since a raw accuracy score of 0.7 is acceptable, this result of 0.95 accuracy is very satisfactory. Clearly, the scale can be applied consistently and others can learn how to use it consistently.
How big is a technical vocabulary?
The texts chosen for analysis were Clinically Oriented Anatomy (Moore and Dalley, 1999, 4th edition) and Learning a Second Language through Interaction (Ellis, 1999). They were chosen because the first author has tertiary qualifications in nursing and applied linguistics and could thus bring specialist knowledge to bear on the classification of the words. The texts were of different lengths and were probably intended for different kinds of audiences -- the anatomy text being largely intended for those new to the field while the applied linguistics text may have been intended for those who already have some knowledge of applied linguistics. These differences undoubtedly affected the results. However, the primary purpose of the research was to see if technical vocabulary could be reliably distinguished from other vocabulary, and to gain some indication of the size and density of technical vocabularies. Table 3 contains the data about the size and various levels of the vocabulary in the two texts. The unit of counting is the word type. The counting of word types (Table 3) and tokens (Table 4) was done by a computer program called RANGE which is available free at http://www.vuw.ac.nz/lals/staff/paul_nation/index.html Word types, rather than word families, were used as the unit of counting because it was found that just because one or two members of a family were technical words, not all of them were (e.g., frequency and frequent). A word type is a single word form, such as agree or agrees. When types are counted, agree and agrees are counted as different types. A word family, on the other hand, includes a collection of formally related and semantically related word types. So, the agree family could include agree, agrees, agreed, agreeing, agreement, disagree, disagreement (Bauer and Nation, 1993).
The figures exclude the preface, acknowledgements, table of contents, and bibliography. Anatomy clearly has a large technical vocabulary -- 4,270 word types making up 37.6% of the total word types. The applied linguistics text is much shorter than the anatomy text (see Table 4, 93,445 tokens compared with 452,192 tokens), and its 835 technical word types make up a much smaller proportion (16.3%) of the total word types. This is not an effect of text length but of the less specialised and more generally accessible nature of applied linguistics.
What kinds of words make up a technical vocabulary?
Let us look at some examples of the technical vocabulary from each of the texts. Word types at step 3 on the scale in the anatomy text include canal, disease, clinical, and pulse, and at step 4, laminae, xiphoid, spinous, and subcostal. In the anatomy text, 35.6% of the technical vocabulary is at step 3 on the scale, while the majority, 64.4%, is at step 4. Words from step 3 on the scale in the applied linguistics text include instruct, interrogatives, paraphrase, and glossing, and at step 4 hyponomy, morphosyntax, lexicon and polysyllabic. A rather large percentage, 88.4% of the technical vocabulary in the applied linguistics text is at step 3 on the scale, while a much smaller proportion, 11.6%, is at step 4.
It should be clear from these examples that some technical words are common in ordinary English. Technical words at step 3 of the scale can come from the high frequency words or the Academic Word List. In the anatomy text 16.3% of the word types at step 3 were from the GSL or AWL. No words from step 4 were. In the applied linguistics text, 50.5% of the words at step 3 were from the GSL or AWL. Applied linguistics clearly uses an accessible vocabulary -- 6.2% of the words at step 4 on the rating scale (Table 1) were from the GSL or AWL. The figures given in Tables 3 and 4 have had this technical vocabulary removed from them. For example, the 2000 high frequency words of English cover 77.7% of the tokens or running words in the applied linguistics text. But 11.9% of these words are actually high frequency words that are closely related to the field of applied linguistics, most of which are at step 3 of the rating scale. Similarly, the AWL covers 13.1% of the applied linguistics text, but 47.3% of these tokens are technical words in applied linguistics. The same is true to a much lesser extent in anatomy, and has been noted in other studies. For example, words like cost, demand, price, supply are common technical terms in economics (Sutarsyah, Kennedy and Nation, 1994). Words like circuit, field, energy, and plate are common technical words in electronics (Farrell, 1990).
Clearly, there is a striking difference between anatomy and applied linguistics not only in the amount of technical vocabulary but also in the nature of that vocabulary. Anatomy has a much larger technical vocabulary (4,270 different words in Table 3) and when that vocabulary was rated on the scale in Table 1 it was found that two-thirds of that technical vocabulary (64.4%) is made up of word forms that are largely restricted to the field of anatomy. Applied linguistics has a smaller technical vocabulary and most of that vocabulary (88.4%) is made up of words that are largely familiar to people with no specialist knowledge of the field.
How important is technical vocabulary in specialised texts?
We have looked at how large a technical vocabulary might be and the kinds of words that it can consist of. Let us now look at how often these words occur in specialised texts. The unit of counting is the token. The previous sentence contains seven tokens and six types. The word type the occurs twice (has two tokens) in the sentence. Thus in Table 4 the first 2000 word families account for 239,790 tokens in the anatomy text and 63, 992 tokens in the applied linguistics text.
Table 4 shows that the 2000 word family General Service List (West, 1953) which has had technical terms in anatomy removed (for example, neck, arm, stomach) accounts for 239,790 of the 450,000 running words in the anatomy text which is 53.3% of the total tokens. The most striking figure is the 31.2% coverage by technical terms. Almost one out of every three words in the anatomy text (31.2%) is a technical term. The sample of marked anatomy text in Figure 1 shows this.
The first 2000 high frequency words are unmarked, words in the AWL are in bold, low frequency words, i.e., not in the first 2000 high frequency words or AWL are in italics, technical terms are underlined. So chest in line 1 is a technical word from the first 2,000 words of English.
In addition to this quite a large proportion of the running words (11.8%) consists of low frequency words. The picture is somewhat different in the applied linguistics text, but the technical terms still account for quite a large proportion of the text (20.6%) -- one word in every five. Figure 2 is a marked sample of the applied linguistics text.
The low frequency words make up only a small proportion of the words in the applied linguistics text (4%) and among other words include foreign words used in examples, examples of language errors, and names used in references to research.
In both texts, technical words make up a very large proportion of the running text -- much larger than the 5% suggested in Nation (2001: 12). These technical words are less noticeable in the applied linguistics text because a large proportion of them are words from the high frequency list and the AWL which can be technical words in applied linguistics. Nation's estimates were not based on data and were the best guesses from what was left after high frequency, academic, and low frequency vocabulary was counted. At the time there was no reliable way of distinguishing technical words.
How can learners be helped to cope with technical vocabulary?
The rating scale used in this study distinguishes two kinds of technical words -- those that may occur in general non-specialised usage, and those that are largely unique to a particular specialised field. These two kinds of technical words pose different kinds of problems for learners in recognising that a word is a technical word, and in learning technical words.
Recognising technical words
The most obvious technical words are those which have Greek or Latin based forms and which do not occur outside of the specialised area. In the applied linguistics text, these included words like multicolinearity, interlingual, connotative, and BICS, and in the anatomy text, words like perichondrium, ramus, synchondrosis, and viscera. In a separate part of a larger study that this article is drawn from, it was found that for over half of the different technical words in the anatomy text, the writers provided clues that the word was special in some way (Chung and Nation, forthcoming). These clues included (1) the word being defined in the text (what Bramki and Williams call lexical familiarisation), (2) the word being written in bold or italics, (3) the word appearing as a label in a diagram. Readers need to be familiar with these clues. Definitions can take a very large variety of forms (Bramki and Williams, 1984; Flowerdew, 1992), and some of these may be difficult to notice.
Here are some definitions from the anatomy text and the applied linguistics text.
The presence of such definitions is a very strong clue that the word is technical. Recognizing such definitions is particularly important where a common word is used in a specialized way with a restricted meaning, for example negotiation or input in applied linguistics. The definition signals that this known word is now being used with a restricted meaning. Definitions may be accompanied by some typographical marking (using bold or italics as in the examples below) of the technical term being defined.
Repetition also provides a clue that a word may be technical. Technical vocabulary typically occurs much more frequently in a specialized text than in general usage. When a word is used often, learners should check if it has a restricted meaning in that subject area.
Learning technical words
Whenever a word is met and it is used in a way which is different from previously met uses, it is worthwhile for the teacher to draw attention to the way that this particular use relates to other uses of the word. Where technical terms are extensions of words in general use, it is useful for learners to see how the technical sense of the words relates to the core meaning of the word, as in girdle and cavity
supports the pectoral girdle
the contents of the thoracic cavity
This is most easily done by getting the learners to look at all the senses of the word in a dictionary and see what core meaning they have in common. Ruhl (1989) would see this as an application of the "monosemic principle", that is that it is always good to assume that formally related words have related meanings. For example the entry for girdle in the Collins English Dictionary (Hanks, 1984) has the following senses:
The core meaning in this example is the second item in the entry and can be found in all the other senses listed. There are other activities which encourage learners to relate senses of words to each other (Visser, 1989). The value of seeing how particular uses relate to a core meaning is that it makes later meetings with the word easier to understand, and brings under one concept items that may be represented by different concepts in the first language.
It is also useful to note if the technical use of the word involves a collocation or a grammatical form that differs from its other uses. Most technical vocabulary needs to be learned productively by learners specializing in that area and learning common collocations and grammatical patterns helps this.
Words which are based on Greek or Latin roots should be analyzed where possible and the meanings of the word parts should be related to the meaning of the word. This can be done by the teacher, but it is also a strategy that learners can work on (Anglin, 1993). In the anatomy text there are affixes which are useful in anatomy and these are worth learning.
There are two major problems for teachers in helping learners deal with technical vocabulary. Firstly, the English teacher does not usually have specialist knowledge of the learners' technical areas. Secondly, technical vocabulary needs to be worked on while getting on top of the specialized field. However, in spite of these limitations, teachers can play a small but useful role in preparing learners for coping with technical vocabulary. This can be done by helping learners gain the more general skills of recognizing technical words, interpreting definitions, relating senses to a core meaning, and learning word parts. Teachers can provide learners with the tools for dealing with technical words. In this way teachers need not get involved in trying to teach in a technical area, but can direct their attention to vocabulary strategies.
This research has used a scale especially developed for the study to examine the nature and amount of technical vocabulary in two quite different technical texts. The study shows that while some subject areas are likely to have very heavy technical vocabulary loads, a significant amount of this vocabulary is not necessarily new for beginners in the field. This is largely because some technical words are used outside the technical area with much the same if not identical meanings. Moreover, some words which are typically high frequency words or academic words can function as technical words in certain fields (Flowerdew, 1993). Technical vocabularies can be large, and they can also account for a very large proportion of the running words or tokens in a text. The present piece of research has looked at two texts to examine something of the nature of technical vocabulary. A much larger, representative corpus of a technical field would be needed to come near to listing a definitive technical vocabulary for that field.
Anglin, J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development Serial No. 238, Vol. 58, No 10.
Baker, M. (1988). Sub-technical vocabulary and the ESP teacher: An analysis of some rhetorical items in medical journal articles. Reading in a Foreign Language, 4(2), 91-105.
Bauer, L. and Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(4), 253-279.
Becka, J. (1972). The lexical composition of specialized texts and its quantitative aspect. Prague Studies in Mathematical Linguistics, 4, 47-64.
Bramki, D. and Williams, R.C. (1984). Lexical familiarization in economics text, and its pedagogic implications in reading comprehension. Reading in a Foreign Language, 2(1), 169-181.
Chung, T. M. (2003). Identifying technical vocabulary. Unpublished Ph.D. thesis, Victoria University of Wellington.
Chung, T. M. (in press). A corpus comparison approach for terminology extraction. Terminology.
Chung, T. M. & Nation, I. S. P. (forthcoming). Identifying technical vocabulary.
Cowan, J. R. (1974). Lexical and syntactic research for the design of EFL reading materials. TESOL Quarterly, 8(4), 389-400.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Ellis, R. (1999). Learning a second language through interaction. Amsterdam: John Benjamins.
Farrell, P. (1990). Vocabulary in ESP: A lexical analysis of the English of electronics and a study of semi-technical vocabulary. CLCS Occasional Paper No. 25 Trinity College.
Flowerdew, J. (1992). Definitions in science lectures. Applied Linguistics, 13(2), 202-221.
Flowerdew, J. (1993). Concordancing as a tool in course design. System, 21(2), 231-244.
Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11(4), 341-363.
Hanks, P. (Ed.). (1984). Collins dictionary of the English language. London: Collins.
Martin, A. V. (1976). Teaching academic vocabulary to foreign graduate students. TESOL Quarterly, 10(1), 91-97.
Moore, K. L. & Dalley, A. F. (1999). Clinically oriented anatomy. (4th edition) Philadelphia: Lippincott, Williams & Wilkins.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Nation, I. S. P. & Coxhead, A. (2001). The specialised vocabulary of English for academic purposes. In J. Flowerdew and M. Peacock (Eds.), Research perspectives on English for academic purposes (pp. 252-267). Cambridge: Cambridge University Press.
Oh, J., Lee, J., Lee, K., & Choi, K. (2000). Japanese term extraction using dictionary hierarchy and a machine translation system. Terminology, 6, 287-311.
Rosenthal, R. (1987). Judgment studies: Design, analysis, and meta-analysis. Cambridge: Cambridge University Press.
Ruhl, C. (1989). On monosemy: A study in linguistic semantics. Albany: State University of New York Press.
Sutarsyah, C., Nation, P. & Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based study. RELC Journal, 25(2), 34-50.
Visser, A. (1989). Learning core meanings. Guidelines, 11(2), 10-17.
West, M. (1953). A general service list of English words. London: Longman, Green & Co.
Williams, R. (1981). Lexical familiarization in content area textbooks. In L. Chapman (Ed.), The reader and the text (pp. 49-59). London: Heinemann Educational Books Ltd.
Yang, H. (1986). A new technique for identifying scientific/technical terms and describing science texts. Literary and Linguistic Computing, 1, 93-103.
About the Authors
Teresa Chung Mihwa studied for her doctoral dissertation at Victoria University of Wellington. She has studied nursing and applied linguistics, and has taught public health and English in Korea. e-mail: firstname.lastname@example.org
Paul Nation teaches in the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand. He has taught in Indonesia, Thailand, the United States, Finland, and Japan. His specialist interests are language teaching methodology and vocabulary learning. His latest book is Learning Vocabulary in Another Language (Cambridge University Press, 2001). e-mail: Paul.Nation@vuw.ac.nz