Admiraal, W., Westhoff, G., & de Bot, K. (2006). Evaluation of bilingual secondary education in the Netherlands: Students' language proficiency in English. Educational Research & Evaluation, 12(1), 75-93.
keywords: quasi-experimental; longitudinal; English; Dutch; bilingual programs; Netherlands; secondary education
annotation: Since 1989, the number of Dutch-English bilingual secondary schools in the Netherlands has been growing substantially. Funded by the Dutch Ministry of Education, Admiraal, Westhoff, and de Bot conducted a six-year longitudinal comparative study of lower secondary learners’(12-15 year olds) English proficiency (vocabulary, pronunciation, reading and oral ability), subject knowledge (history and geography), and Dutch ability in bilingual and non-bilingual schools. Researchers gathered data utilizing national standardized exit tests (Cito, MAVO, etc.) for intermediate secondary education, in addition to the EFL Vocabulary Test created by Meara (1992). Also included were learner background and attitudinal surveys. Results indicated that students in the bilingual schools outperformed the control group in reading, oral, and pronunciation tests. Vocabulary and subject knowledge were comparable across groups. Admiraal, Westhoff, and de Bot caution against overinterpretation of the study’s results due to three factors: (1) the limited comprehensive dataset for Dutch and subjects tests, (2) the motivation factor of the students in the pioneer bilingual programs, and (3) the societal perception of English as a prominent language in Dutch society.
Alderson, J. C. (1992). Guidelines for the evaluation of
language education. In J. C. Alderson & A. Beretta (Eds.), Evaluating
second language education (pp. 274-304). Cambridge: Cambridge University
Press.
keywords: ESP; university; Brazil; perception data; survey; interview; journal; report; participatory evaluation
annotation: Alderson reviews the issues of “who, what, when, how, how long, to evaluate and to point the way forward to further developments…in the methodology and practice of language education evaluation” (p. 274), as gleaned from the case studies in other chapters of the book. He cautions would-be evaluators that evaluation is reflexive, and it depends on the purpose, the nature of the program, the individuals involved (personalities and politics), time constraints, and available resources. Searching for perfect objectiveness in evaluation is not possible, since all stakeholders have different perspectives and values, and this reality must influence the design, implementation, and interpretation of any evaluation. Alderson presents a set of guiding questions for help in planning an evaluation study: (a) Why is evaluation required (Consider official and hidden agendas)? (b) Who is the evaluation for (Identify stakeholders' purposes)? (c ) Who is participating in the evaluation process? (d) What expertise is required as an evaluator? (e) What is the focus of the evaluation based on discussions and negotiations with the stakeholders? (f) How is the program evaluated (Adapt various methodologies and triangulate methods)? (g) When is evaluation to take place (Purposes of evaluation determine when to evaluate)? (h) How long should the evaluation last? And (I) What happens to an evaluation report (Agree on what is to be delivered by the evaluation study with stakeholders to ensure utilization)? During implementation, Critical Path Analysis (recognizing and stating key points and periods of time) may assist in deciding how much adaptability of plans is possible. To respond to various needs for interpretation and reporting, it is “important for the evaluator to devise ways in which the different interpretations of data that are both theoretically inevitable and practically and politically important can be gathered as part of the evaluation” (p. 296). In order to increase the possibility of utilization of the evaluation report, the evaluation should be relevant for the stakeholders, a result of negotiation with the stakeholders, based on adequate resources and feasible implementation, kept to a timeframe, adequately interpreted in terms of educational policy, and adequately reported.
Alderson, J. C., & Beretta, A. (Eds.). (1992). Evaluating second language education. Cambridge: Cambridge University Press.
keywords: bilingual education; EFL; ESP; case study; university; elementary; secondary; overview; framework; guideline; outsider; insider;participatory evaluation; political
annotation: Alderson and Beretta's edited collection of 10 chapters covers theoretical and methodological issues in language program evaluation, and presents case studies from a variety of contexts. Chapters reflect the realities of program evaluation, highlighting conflicts and compromises called for at various decision points in the process. The book is comprised of three sections: (a) an overview article by Beretta, examining 25 years of previous evaluation studies in language teaching (since the 1960s); (b) eight case studies of current practice (by Alderson & Scott; Lynch; Mitchell; Palmer; Ross; Slimani; Coleman; Beretta, in chapter order), including summaries of evaluation findings and useful templates, instruments, etc., followed by post-script comments from the editors; and (c ) guidelines by Alderson on the design of evaluation projects.
The book is very practical for helping would-be evaluators learn and reflect on what previous evaluators' have experienced in the field. The range of language contexts featured in the book include English as a second/foreign language (primary, secondary, and college level), college level English for specific purposes, Gaelic-English bilingual education (primary level), and German (college level).
Alderson, J. C., & Scott, M. (1992). Insiders, outsiders and participatory evaluation. In J. C. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 25-57). Cambridge: Cambridge University Press.
keywords: ESP; university; Brazil; perception data; survey; interview; journal; report; participatory evaluation
annotation: Alderson and Scott report on the participatory nation-wide evaluation of English for Specific Purposes (ESP) courses in 45 universities in Brazil, focusing in particular on reading skills. The evaluation was called for after 7 years of development and implementation of the ESP project, for the purpose of continuation of funding from the Overseas Development Administration. With external and internal funding, this unique large-scale evaluation involved considerable time, cooperation, and manpower. The design of the evaluation, construction and piloting of instruments, collection of data, and drafting of the report were all done in a democratic manner involving project coordinators, teachers, research assistants, and a consultant (British Council consultant). The following factors were evaluated: context, methodology, implementation of methodology, project achievement, teacher-training implementation, and exchange of ideas and experience. This study gathered information from multiple source with multiple instruments to describe program outcomes. "Perception data" were collected from current ESP students, graduates of ESP, subject specialists/teachers, and ESP teachers. In addition, ESP student-reports of class discussions, ESP teacher-reports on the same discussions, ESP teachers' post-questionnaire interviews, and statistics on the use of the language center were collected for triangulation. However, the authors acknowledge problems in the sampling, questionnaire, analysis of qualitative data, absence of classroom observation, and lack of testing student outcomes. Despite these problems, the authors demonstrate how evaluation can empower and build the capacity of local stakeholders, including capacity to conduct future internal evaluations. As the authors mention, "Many, if not most, of the teachers involved also seemed to learn a great deal about evaluation: how it might be planned, how data might be collected, and how results might be interpreted" (p. 52). Provided in the book' s appendix is an outline of this evaluation project' s proposal, instruments, and results, which may be useful as a reference for designing future language program evaluations.
Anderson, J. (1998). Managing and evaluating change: The case of teacher appraisal. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing evaluation and innovation in language teaching: Building bridges (pp. 159-186). London: Longman.
keywords: Turkey; university; EFL; teacher appraisal; teacher training; management; meeting; discussion; questionnaire; observation; improvement
annotation: Anderson, from the perspective of a management team member, documents the introduction, implementation, and evaluation of a teacher appraisal scheme (TAS) at Bilkent University School of English Language in Turkey. The purposes of appraisal were teacher accountability and professional development. There were three stages to the appraisal cycle: initial meeting for identification of teachers' interests, skills, and needs (Teacher Profile form); progress review meeting; and an end of the year meeting for reflection. The four classroom observations were linked with the targets identified in the Teacher Profile form. Anderson states some of the factors that affect the quality of teacher performance: teaching experience; qualifications; familiarity/understanding of the curriculum; colleagues; available resources; ability to evaluate, reflect, and change; motivation towards the job. During the implementation of the appraisal system, various tensions arose (the purpose of appraisal, ownership, cultural diversity of staff members, motivation, opportunity cost, reward for participation, operational issues, the continuity of ownership, and monitoring and quality control by the management team). An evaluation of the TAS was conducted after two years of implementation to verify and improve its quality. Based on the existing appraisal documents and data obtained from a half-day workshop for teachers (discussion, open-ended written response, and a questionnaire), the evaluation revealed that 50% of the teachers benefited from the TAS. It also identified the problem areas where TAS needed improvement. In the end, the appraisal system was refined and reworked based on the evaluation, showing the evolving process of the innovation.
Arnold, N. (2009). Online extensive reading for advanced foreign language learners: An evaluation study. Foreign Language Annals, 42(2), 340-366.
keywords: US; university; German; reading; formative; process; product; qualitative; questionnaire; case study
annotation: Arnold reports on an evaluation of an online extensive reading program implemented in an advanced German as a foreign language course at a US university. This pilot program incorporated certain modifications which distinguished it from traditional extensive reading programs. The purpose of the evaluation was to determine whether the program met its goals and investigate the effects of the modifications. In an effort to gain a deep understanding of the program and its effects, the evaluators focused on process as well as product and utilized qualitative data that illuminated the experiences of the students. Data collection instruments included student questionnaires, reading reports, and reflection journals. Based on the findings, the evaluators determined that the program could be implemented on a large scale, but also made suggestions for improvement. The evaluators note that the evaluation was limited to student perceptions and suggest adding tests to measure linguistic gain and longitudinal research to future evaluations.
Bachman, L. F. (1989). The development and use of criterion-referenced tests of language ability in language program evaluation. In R. K. Johnson (Ed.), The second language curriculum (pp. 242-258). Cambridge: Cambridge University Press.
keywords: criterion-referenced; testing; communicative language ability; proficiency scale
annotation: Bachman is concerned with what learner outcomes to measure and how to use them in the evaluation of language programs. He points out the inadequacies of norm-referenced testing in addressing the needs of program evaluation, and he notes inadequacies in definitions of language proficiency. In formative evaluation, Bachman suggests that identifying specific instructional objectives and gathering students' information on achievement are necessary. In summative evaluation, Bachman argues, information is needed on both stated and unexpected outcomes that are consistent with the broader goals of educational systems and society. For program evaluation in general, he advocates testing that "involves the combination of the criterion-referenced approach to test development with a current specification of the domain of language proficiency" (p. 251), which he call 'communicative language ability' (language competence, strategic competence, and psychophysiological skills). To satisfy comparability across programs, abstract proficiency scales independent of contextual features of language use should be defined. Bachman also stresses the need to empirically test this framework.
Barr, D., Leakey, J., & Ranchoux, A. (2005). Told like it is! An evaluation of an integrated oral development pilot project. Language Learning & Technology, 9(3), 55-78.
keywords: Canada; CMC; classroom; computer; oral; French ; testing; questionnaire; journal
annotation: Barr, Leakey and Ranchoux conducted a methods comparison study between computer-mediated communication (CMC) and face-to-face instruction of a conversation class. They were interested in: (a) whether computer technology enhances progress in students' oral language development; (b) the factors that may affect students' oral language development when using computers; and (c ) staff and students' reactions to using computer technology in conversation classes. Four groups of 5 to 11 university students (29 total, Arts students as control and Applied Languages students as treatment) participated in French conversation classes one hour per week for 12 weeks. Data collection included a language experience questionnaire, information-and-communications-technology-use questionnaire (use of email and the web), student journal logs (self-assessing their linguistic development and the class), and pre-and post-test (composed of a pronunciation task, personal questions, a listening comprehension exercise, and an oral résumé of a television documentary). Findings indicated that students in the traditional classroom setting did better than the students in the CMC environment. Some class time was spent getting used to the computer-based environment and software, thus the amount of content covered in the treatment/control classes differed. Based on qualitative findings, CMC students did appreciate the individualized opportunities for practicing pronunciation, but rated discussions and debates as the best aspect of the oral classes (I.e., the parts requiring the least technology). Tutors in CMC felt that technology had a dehumanizing effect on oral classes. Results were inconclusive regarding the role of technology in CMC oral communication classes, since the use of technology was limited to oral drills and not applied to meaningful communication.
Beretta, A. (1986a). A case for field-experimentation in program evaluation. Language Learning, 36(3), 295-309.
keywords: experimental; field research
annotation: Beretta discusses the limitations of adapting a laboratory research methodology to language teaching program evaluation, arguing instead for what he calls "field-experimentation". Field-research is a "long-term, classroom-based inquiry into the effect of complete programs, the degree of control being partly dependent on whether correlational or experimental information is sought" (p. 296). Since field research is concerned with the generalizability of the findings to classroom contexts (external validity), he suggests that field research will provide findings that are relevant to immediate pedagogic concerns. Nevertheless, the evaluator has to keep in mind that the choice of methodology depends on the purpose of the evaluation study, the kind of questions stakeholders pose, the feasibility of the methodology, and availability of resources for evaluation.
Beretta, A. (1986b). Program-fair language teaching evaluation. TESOL Quarterly, 20, 431-445.
keywords: norm-referenced testing; criterion-referenced testing; program fair testing; bias; validity
annotation: Beretta gives examples of how non-program-fair tests can favor one teaching methodology over another. Previous program comparison studies on the effectiveness of teaching methodologies have been critiqued due to the use of content-biased tests; the use of standardized testing inappropriate for actual levels of students; and the use of instruments to insensitive actual features of effectiveness. Beretta suggests the use of criterion-referenced (program specific) tests in order to raise awareness and sensitivity to the features of the program, and to strengthen the content validity of the assessment. He also cautions against test bias in judging the outcomes of educational effectiveness, though it is unclear on what basis the bias might be identified.
Beretta, A. (1986c). Toward a methodology of ESL program evaluation. TESOL Quarterly, 20(1), 144-155.
keywords: method; experimental; contextual factors; causality
annotation: Beretta points out how rigorous experimental methods do not fit language teaching program evaluation studies from the perspective of external validity. Rather than pursuing causality, he advocates an applied inquiry in which outcomes will be more directly relevant for pedagogy. He suggests that "(a) we conduct our investigations in the field rather than in artificially controlled "laboratory" settings, (b) we consider the effect of total programs rather than isolated components of them, (c ) the duration of the studies be long-term rather than short-term, (d) randomization is not always practicable or crucial" (p. 145). Note that Beretta' s view may conflict with the recent expectation for rigorous research methodology announced by the U.S. Department of Education Institute of Educational Sciences, which prioritizes randomized controlled trial research as the gold standard .
Beretta, A. (1990a). Implementation of the Bangalore Project. Applied Linguistics, 11(4), 321-340.
keywords: India; EFL; implementation; retrospective; questionnaire; specialist; teachers; Communicational Teaching Project
annotation: Beretta reports on evaluation of pedagogical implementations in the Bangalore Project (also known as the Communicational Teaching Project or CTP), through the use of retrospective protocols (asking teachers to describe how they taught). The protocols of 15 teachers' (4 regular public school teachers (RT) and 11 non-regular highly-qualified teachers (NRT)) were coded into three pedagogic implementation levels: (1) orientation (not fully aware of the CTP), (2) routine (operating comfortably with the CTP), and (3) renewal (seeking ways to improve the CTP). Beretta concludes that a different sense of ‘ownership' of the CTP was found between the RTs and the NRTs, which was reflected in the different implementation levels. The RTs acted as if they were new to CTP, while the NRTs seemed more comfortable with CTP as routine practice. Because the study was conducted following implementation, no classroom observations were included; it is, thus, hard to say what the frame of reference was when the teachers answered the survey. No other stakeholders of the project were included in the survey, so only a partial view of implementation was possible. The study' s main contribution may be to raise language program evaluator' s awareness of how important it is to evaluate not only the product but also the implementation process of a program.
Beretta, A. (1990b). The program evaluator: The ESL researcher without portfolio. Applied Linguistics, 11, 144-155.
keywords: dialogue; negotiation; user-oriented; utilization; policy
annotation: This article reflects the acknowledgement in ESL program evaluation of major issues discussed in the broader field of educational evaluation: the importance of negotiation (dialogue) between evaluator and the stakeholders, utilization of evaluation, localization (contextualization) of evaluation, and the examination of policies attached to evaluation. Beretta highlights conflicts the evaluator may face between the policy-shaping community, which makes pragmatic decisions, and scholars, who judge standards of research. In order to shape a user-oriented evaluation, he suggests the following to be fully transparent and falsifiable: (a) obtain a hearing, (b) identify the policy-shaping community, (c ) negotiate reasonable research questions, (d) design and collect data, and (e) communicate the findings.
Beretta, A. (1992a). Evaluation of language education: An overview. In J. C. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 15-24). Cambridge: Cambridge University Press.
keywords: overview; model; quantitative; qualitative; history
annotation: This chapter provides a comprehensive overview of the trends and models in educational program evaluation since the early "behavioral objectives approach", which "compare[s] intended outcomes with actual outcomes" (p. 13). Large-scale evaluation studies in the 1960s and 1970s tended to compare one program to another using and experimental approach, but they were inconclusive, due to methodological design flaws (internal consistency, comparison groups, randomization, etc.). Moving from an inadequate dichotomy of quantitative vs. qualitative, a more eclectic/pragmatic philosophy emerged in response to various program evaluation purposes. Professional evaluation standards developed in the 1980s recognized "the heterogeneity of evaluation needs and approaches," which led to "the four attributes for evaluation: utility, feasibility, propriety and accuracy" (p. 18).
Beretta summarizes what has been learned and developed in the field of education, indicating future directions for evaluation in language education: (a) the most appropriate evaluation methods should be chosen according to what the audience (policy-shaping community) wants to know; (b) in program evaluation, user-relevant information should precede the advance of language learning theory; (c ) evaluation should be considered from the outset in the design of the program; (d) step one should involve the negotiation of aims for evaluation, prioritization of questions in terms of the capacity of the program evaluation (time, cost, learnability, impact), and translation of policy questions into evaluation questions; (e) based on information needs and deadlines of clients, appropriate methodologies should be adopted; (f) findings should be translated back into language of policy, and different forms of reporting should accommodate different audiences.
Beretta, A. (1992b). What can be learned from the Bangalore Evaluation. In J. C. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 250-271). Cambridge: Cambridge University Press.
keywords: India; EFL; Bangalore Project; Communicational teaching project; method comparison; conflict
annotation: Evaluation of the Bangalore Project was reported by Beretta in a variety of publications (see Beretta, 1990 and Beretta & Davies, 1985,for detailed explanation of the Bangalore project, its aim, and evaluation findings). Here, Beretta offers his retrospective "if I had known then" account of the evaluation activities he conducted. He highlights the needs to negotiate and clarify the purpose, methods, and specific information to be collected, during the planning stage. He also articulates a rationale for why intended use of evaluation outcomes should be identified prior to evaluation design and data collection. However, he points to the fact that external evaluators often do not have adequate time to negotiate plans with program stakeholders and develop the instruments needed. Since the evaluation took place towards the end of the project, information was scattered or unavailable, and it was not possible for Beretta to obtain a rich/thick description of the project. Furthermore, the values of a few select stakeholders led to much of the decision making about evaluation, before it commenced. Evaluation should ideally be integrated within curriculum and with actual program context, but in the real-world, Beretta cautions that the evaluator may have to deal with a situation where there are severe limitations to planning, resources, data, and the like.
Beretta, A., & Davies, A. (1985). Evaluation of the Bangalore Project. ELT Journal, 39(2), 121-127.
keywords: India; EFL; testing; proficiency; achievement; experimental; Communicational Teaching Project; ODA
annotation: Beretta and Davies report on the evaluation of the Bangalore Project (also known as Communicational Teaching Project (CTP)), an accountability driven evaluation led by the Overseas Development Administration. The evaluation took place towards the end of the project by an external evaluator. The purpose of the evaluation was to compare the CTP methodology (task-based) with the traditional (structural) methodology to determine its effectiveness. Due to constraints imposed by the principal investigator of the CTP project, the evaluation utilized a quasi-experimental approach to evaluation. However, the authors recognized the difficulty of adopting a rigorous design in educational contexts. Students from four schools (each with one experimental and one control class) took both achievement tests (structure test and task-based) and proficiency tests (contextualized grammar, dictation, and listening/reading comprehension). The proficiency tests were administered as neutral measures to overcome any test content bias. The validity of this experimentation was questioned because of the instability of the educational context, the lack of reference points for comparing the two groups, and the absence of the description and observation of classroom practices to justify the differences between the two groups.
Bernhardt, E. B. (2006). Student learning outcomes as professional development and public relations. Modern Language Journal, 90(4), 588-590.
keywords: assessment; multiple languages; OPI; SOPI; course evaluation; university
annotation: Bernhardt reports on the successful expansion of the Stanford Language Center as a direct result of the increased visibility of student learning outcomes assessment and student course evaluations. In 1996, the Stanford Language Center began implementing SOPI assessments at the end of the language requirement to evaluate the current program outcomes for the language learners. In addition to the assessment data, course evaluations were gathered to reveal students’ satisfaction with their language courses. Two factors were associated with efforts in implementing an assessment system: (1) a cultural change in understanding learner performance assessment, and (2) the need for teacher capacity building via OPI certification training. Learner outcomes and positive course evaluations supported the language center’s assessment efforts. They also provided justification for the university to allocate additional funds for teacher development and assessment programs as well as increased staffing and the addition of a professional reward system for instructors. Systematic documentation and publishing of the results provided a strong justification for the efficacy of the language programs offered via Stanford Language Center.
Bernhardt, E. B. (2008). Assessment as a keystone for language and literature programs. ADFL Bulletin, 40(1), 14-19.
keywords: US; university; foreign language; assessment; systemic; systematic; SOPI; oral proficiency
annotation: Bernhardt’s essay advocates for “systemic and systematic student assessment” (p.14) as a means of strengthening university foreign language programs and helping them integrate language and literature. She describes how the Stanford Language Center uses the SOPI (Simulated Oral Proficiency Interview) to assess all students at the end of their first year of language study. The SOPI provides the center with data for internal program evaluations, as well as clearly defined outcomes useful for presenting to those outside the program and meeting demands for accountability. Moreover, the ability to demonstrate student improvement using nationally recognized standards has strengthened the position of the language center within the university and facilitated its funding requests. Bernhardt asserts that the development of a systemic assessment procedure can also help programs struggling to unify literature and language because the process of articulating outcomes requires departments to analyze expectations for both language and literature skills.
Birckbichler, D. W. (Ed.). (2006). Evaluating foreign language programs: Content, context, change. Columbus, OH: Foreign Language Center, the Ohio State University.
keywords: guidelines; framework; ethnography; communication; stakeholders; focus group; observation; interview; proficiency testing; reporting; multiple languages; university
annotation: The edited book provides practical guidelines for conducting foreign language program evaluation at the postsecondary level. The chapters are divided into three parts: framing the evaluation (Chapters 1-3), asking the right questions (Chapters 4-6), and reporting for change (Chapters 7-8). Each chapter offers guidelines and example tools for different steps within the program evaluation process.
Part One of the book offers suggestions for setting the stage for program evaluation. In chapter one, Costner gives a bird’s eye views of different program evaluation approaches that are applicable to foreign language education. He proposes a content-specific approach to evaluation, where evaluators obtain specific knowledge of foreign language education (e.g., knowledge of the language taught) and reflect program culture in evaluation design. In Chapter Two, Kawamura advocates an ethnographic approach to evaluation, which requires thick descriptions of various program elements, a balance between emic (program internal) and etic (program external) perspectives, and data collection from multiple sources and levels. To unveil a program’s culture, Kawamura describes two ethnographic data collection methodologies: participant observation and ethnographic interview. In Chapter Three, “Communication: An essential tool in program evaluation,” Lang reminds evaluators that communication is essential to obtain buy-in, support, and cooperation from program stakeholders. To make the evaluation process transparent and to maintain professional distance with the stakeholders, evaluators need to use various communication tools and strategies, such as an email list, a website with plans and updates, etc.
Part Two covers planning data collection methodologies for program evaluation. Kawamura, Dassier, and Costner approach data collection with stakeholder participation and collaboration in mind. In the three methodology chapters, Kawamura, Dassier, and Costner outline reflective questions in defining the framework and scope of data collection methodology. Focus groups and proficiency testing are covered in detail in Chapters Five and Six, respectively. In planning for a focus group session, carefully sequenced questions, moderator skills, and logistics (e.g., site, time, recording) are important elements to consider. For data analysis, thematic concept mapping and key word identification can be conducted to find common and contradicting ideas that emerge from the transcript (see examples in the chapter). In Chapter Six, Dassier suggests making use of the Proficiency Guidelines and the Standards articulated by the American Council on the Teaching of Foreign Languages for framing what to test. She warns test developers to contemplate validity, reliability, and practicality of the test and offers a practical check-list for choosing and developing appropriate tests.
The last two chapters focus on what to do after data interpretation with the stakeholders. In Chapter Seven, Lang explains six rules-of-thumb for reporting: (1) include a short and concise executive summary; (2) state a clear rationale for evaluation design, instruments, and interpretation; (3) contextualize the program under evaluation; (4) format the report effectively; (5) consider a variety of different forms to report findings (e.g., website); and (6) be tactful and fair. Chapter Seven also contains a detailed list of what to include in an evaluation report. In the last chapter, Birckbichler emphasizes the importance of using evaluation findings for taking programmatic action as well as the ongoing cyclical nature of program evaluation.
Brindley, G. (1998). Outcomes-based assessment and reporting in language programs: A review of the issues. Language Testing, 15(1), 45-85.
keywords: Australia; England; outcomes-based assessment; summative
annotation: Outcomes-based assessment relates summative classroom-external assessment with classroom-based learning assessment as a way of responding to different stakeholders who need to understand what students achieve in terms of well-defined learning outcomes. This article outlines problems the past assessment practices have encountered and suggests strategies for developing and implementing outcomes-based reporting. Brindley suggests: (a) collect a comprehensive range of information; (b) hold a dialogue with stakeholders to clarify the purpose of the assessment reform, identify actual information needs, and increase ownership of the reform; (c ) research the relationship between outcome statements and assessment tasks; (d) link assessment tasks to levels of achievement by training teachers and creating banks of assessment tasks; (e) consult a variety of reporting methods for use among key stakeholders so that the educational value of reported information will not be reduced; (f) use multiple raters, scoring, and sources to overcome variability in judgments of performance; and (g) provide ongoing support for teachers, with continual open review and critique of the assessment process. Outcomes based on benchmarks (exemplary student performances) may evolve hand-in-hand with what students are actually doing in their coursework. However, once standardized testing is introduced as a criterion for determining student achievement, there is a danger of standardizing instruction as well.
Brindley, G. (2001). Outcomes-based assessment in practice: Some examples and emerging insights. Language Testing, 18(4), 393-407.
keywords: Australia; adult immigrant education; outcomes-based assessment; teachers
annotation: Reporting on adult immigrant education and schooling in Australia, Brindley illustrates challenges that arise in outcomes-based assessment. The two major problem areas are: (a) the collision between political and educational perspectives, and (b) the quality (validity and reliability) of teacher-constructed assessment tasks. The first problem derives from typical governmental emphasis on system accountability rather than on learning, leading education authorities to narrow the curriculum (focusing on students' minimum competencies rather than on a more challenging curriculum). The latter problem derives from the lack of adequate teacher training in outcomes-based assessments and principles of good assessment practice. However, on-going projects, such as creating a well-researched task-bank that consistently reflects different levels of achievement, developing professional task development guidelines, and appropriate professional development will assist teachers in using performance-based assessments.
Brown, J. D. (1989). Language program evaluation: A synthesis of existing possibilities. In R. K. Johnson (Ed.), The second language curriculum (pp. 222-241). Cambridge: Cambridge University Press.
keywords: model; history; approaches; dimensions; product; process; static characteristic; decision facilitation; curriculum process model
annotation: Brown defines evaluation as "the systematic collection and analysis of all relevant information necessary to promote the improvement of a curriculum, and assess its effectiveness and efficiency, as well as the participants' attitudes within the context of the particular institutions involved" (p. 223). He reviews the approaches and dimensions of evaluation from the past and proposes a systematic approach to evaluation integrated in curriculum design. During the forty years of development in educational psychology, four approaches emerged, each one building on the previous: (a) product oriented approach, which measures whether program goals and instructional objectives are met; (b) static characteristic approach, conducted by outside experts and describing the nature of programs; (c ) process oriented approach, in which evaluation is used for improvement of curriculum (formative purpose) and also for judging the program (summative purposes); (d) decision facilitation approach, which feeds information into decision making. The three dimensions of evaluation (formative vs. summative, product vs. process, quantitative vs. qualitative) and the available approaches formulate initial decisions about evaluation procedures. A 'systematic approach,' Brown proposes, has six components for designing and maintaining language curriculum, from conducting needs analysis, to setting goals and objectives, testing, developing materials, and teaching, all interconnected with ongoing (formative) evaluation from the beginning to the end (summative) of curriculum development.
Brown, J. D. (1995a). The elements of language curriculum: A systematic approach to program development. Boston, Massachusetts.: Heinle & Heinle.
keywords: ESL; EFL; model; curriculum development; needs analysis; goals and objectives; tests; materials; teaching effectiveness; efficiency; attitude; China; US
annotation: In the final chapter of this book, Brown discusses language program evaluation as part of ongoing curriculum development. His systematic approach to designing and maintaining language curriculum (also see Brown, 1989; Pennington & Brown, 1991) posits evaluation as a component that can "utilize all the information gathered in the processes of (1) developing objectives; (2) writing and using the tests; (3) adopting, developing, or adapting materials; and (4) teaching" (p. 24) for improving curriculum. The quality of each program component can be analyzed in terms of effectiveness, efficiency, and attitude. Brown also offers an overview of evaluation approaches (goal-attainment, static-characteristic, process-oriented, and decision-facilitation) and three dimensions that shape the perspective taken on evaluation (formative vs. summative; process vs. product; and quantitative vs. qualitative). In adopting the viewpoint framework, appropriate data sources and evaluative questions are outlined. Brown concludes with insights from program evaluation projects at the Guangzhou English Language Center in China, illustrating how his framework was fine-tuned to the context.
Brown, J. D. (1995b). Language program evaluation: Decisions, problems and solutions. Annual Review of Applied Linguistics, 15, 227-248.
keywords: second language; foreign language; evaluation decisions; positivistic; interpretive; quantitative; qualitative; overview
annotation: Brown defines program evaluation as [a] systematic collection and analysis of information necessary to improve a curriculum, assess its effectiveness and its efficiency, and determine participants' attitudes within the context of a particular institution (p. 227). He reviews the field of program evaluation, specifically the work on second and foreign language programs between 1986 and 1994, extending Beretta's (1992) survey of the methodology of evaluation implemented between 1967 and 1985. Brown also outlines some decisions and problems previous evaluators considered or encountered when planning evaluation. There are six types of decisions evaluators have to make: (a) What is the purpose of the evaluation to judge the success (summative evaluation at the end) and/or to improve (formative evaluation, during) the program; (b) What amount and type of expertise are necessary to bring outside experts for credibility, to involve all stakeholders in the program, or to have insiders and outsiders work together (participatory model); (c ) What form will the evaluation take field research (long-term, classroom-based) and/or laboratory research (short-term, test-based); (d) When is the evaluation performed during or after the program, or both, and for how long; (e) What type of data is collected and for what purpose quantitative data and/or qualitative data (interview, observation, journal, correspondence); (f) Is there a need to know the process involved in the curriculum and/or resulting products? When conducting evaluation, Brown cautions evaluators to consider (a) sampling, (b) teacher effects, (c ) test practice effect, (d) Hawthorne effect, (e) reliability of the instruments, (f) finding valid program-fair instruments, (g) politics involved in evaluation, and (h) other potential problems.
Brown, J. D., & Pennington, M. C. (1991). Developing effective evaluation systems for language programs. In M. C. Pennington (Ed.), Building better English language Programs: Perspectives on evaluation in ESL (pp. 3-18). Washington, DC: NAFSA.
keywords: systematic; participatory model; stakeholders
annotation: Brown and Pennington view evaluation as "a process of determining the value of the individual aspects of an organization as a basis for ongoing change and development within that organization" (p. 4). They argue for "program evaluation to be a team effort involving many different personalities and varied input into the review process from others, both within and outside the program" (p. 13) rather than a top-down process. A participatory model enhances the possibility that evaluation will be more responsive to local factors, and it contributes to the growth of "self-determination and professionalism" (p. 15) of teachers as well as the positive evolution of the organization. The authors list the required conditions for fair and effective evaluation to occur: (a) gather information from multiple resources; (b) use different types of instruments; (c ) all stakeholders understand evaluation as an ongoing process; (d) all stakeholders understand evaluation criteria and processes and the their link with the philosophy and goal of the program; (e) administrators believe in the productivity of interaction with the instructors and the leadership of education; and (f) all stakeholders see evaluation as a means for achieving balance between administrative control and individual autonomy.
Byrd, P., & Constantinides, J., C. (1991). Self-study and self regulation for ESL programs: issues arising from the associational approach. In M. C. Pennington (Ed.), Building better English language programs (pp. 19-35). Washington, DC: NAFSA.
keywords: self-study; participatory model; administrator; NAFSA; TESOL; professional organization; force field analysis
annotation: Byrd and Constantinides describe the history of how NAFSA (for higher education) and TESOL (for all levels) developed similar approaches to influencing institutions internally through the use of self-study. Self-study is a study focusing on a single aspect of the program, continued cyclically but with evolving emphases through continuous data collection. It entails adjustment to changes in the outside environment, review and evaluation of current program practice as part of planning, and the tuning of external with internal evaluation. In order to choose the right design for self-study, ‘force field’ analysis can facilitate the process. This practice uncovers, both within and outside the program, all forces that are for versus against program self-study and change. Self-study helps the program to: (a) clarify goals; (b) identify existing problems; (c ) enable learning about the program, procedures, and resource; and (d) identify and produce needed changes. It can also be used to familiarize new administrators with the program, build better understanding of the ESL program within a larger institution by participating in the institution's accreditation reviews, provide information for external reviews, and provide ESL program staff a better understanding of the parent institution.
Byrnes, H. (2008). Owning up to ownership of foreign language program outcomes assessment. ADFL Bulletin, 39(2&3), 28-30.
keywords: university; US; foreign language; outcomes; assessment; ownership; transformative evaluation
annotation: While outcomes assessment has traditionally been viewed negatively by many university foreign language departments, Byrnes argues that departments can take ownership of the process and use it to improve and strengthen their programs. Byrnes traces the professional discussion of this issue and recounts three important contributions toward the reconceptualization of outcomes assessment: John Norris’ 2006 ACTFL presentation on the transformative potential of outcomes assessment; the 2006 Modern Language Journal’s Perspectives column, which explores the why and how of outcomes assessment; and the University of Hawaii National Foreign Language Research Center’s 2007 summer institute on foreign language program evaluation for university faculty. Byrnes also introduces two case studies that contribute to the professional discussion. Both are examples of university foreign language programs taking ownership of the outcomes assessment process: Windham’s (2007) study of Elon University’s efforts to align their foreign language curriculum with the ACTFL guidelines; and Carstens-Wickam’s (2007) account of the role assessment played in departmental improvements at Southern Illinois University-Edwardsville.
Carsten-Wickham, B (2008). Assessment and foreign languages: A chair’s perspective. ADFL Bulletin, 39(2&3), 36-43.
keywords: : US; university; foreign language; outcomes; assessment; NCATE; teacher education; proficiency; standards; internal review; program improvement; case study
annotation: Carsten-Wickham reports on the improvements made in the Department of Foreign Language and Literature (FLL) at Southern Illinois University-Edwardsville as a result of the department’s participation in the NCATE process and an internal program review. Although focused on foreign language teacher education, the NCATE process revealed areas for improvement that benefited the entire foreign language program. During the process, the department reassessed its goals and objectives, brought its curriculum in line with national standards, and identified necessary curricular changes. The process also helped the department identify study abroad as an area for further emphasis and development. Furthermore, because of NCATE and the internal review, the department was able to demonstrate the need for a state-of-the-art foreign language training center and obtain staffing support for the center from the university. Although originally wary of program assessments such as the NCATE, faculty in the FLL now see them as useful tools for program improvement.
Chase, G. (2006). Focusing on learning: Reframing our roles. Modern Language Journal, 90(4), 583-588.
keywords: Student learning outcomes assessment; cross-disciplinary; stakeholders; roles; responsibilities; university
annotation: Chase agrees with and extends Norris’s (2006) claim that a much needed conceptual shift in assessment practice (I.e., viewing assessment as an opportunity to support student learning rather than simply responding to accountability pressures) cuts across diverse disciplines in higher education. Chase advocates that different stakeholders of a department or program reconceptualize their roles and responsibilities to create a learning community of professionals and learners. Faculty members should actively engage in and commit to understanding and improving student learning to reflect each faculty member’s unique strengths and views. Administrators also have an obligation to integrate and promote student learning as a core organizational goal, as well as to provide support structures for faculty development. And to create funding models that integrate student learning as a key component. Finally, Chase argues that students also need to take initiative in their own learning and provide feedback on their learning outcomes to the program.
Chaudron, C., Doughty, C., Kim, Y., Kong, D., Lee, J., Lee, Y., Long, M. H., Rivers, R., & Urano, K. (2005). A task-based needs analysis of a tertiary Korean as a foreign language program. In M. H. Long (Ed.), Second language needs analysis (pp. 105-124). Cambridge: Cambridge University Press.
keywords: US; Korean; foreign language; university; task-based language teaching; needs analysis; materials development; module development; utilization
annotation: Chaudron et al describe the first stage (Needs Analysis) of a three-year, federally funded project on task-based teaching of Korean as a foreign language (KFL), conducted at the University of Hawaii at Manoa. The chapter is a detailed demonstration of how task-based needs analysis can be carried out and utilized for creating prototypical task-based instruction. First, unstructured interviews were conducted with instructors and a stratified random sample of students enrolled in KFL courses. These sought to obtain demographic information, reasons for studying Korean and for going to Korea, current and anticipated future Korean uses, language skills students expect to need, and necessary task performance abilities for language use in Korea and for future jobs. Based on the interviews, a questionnaire was formulated and administered to the entire KFL student population at the university. Target tasks were identified from the survey results. For demonstration purposes, two target tasks were chosen (asking for directions and shopping for clothes); these were used to collect target discourse samples in the U.S. and in Korea. The process of identifying prototypical discourse structures within the samples is explained, and extensive discourse excerpts provided. The last section of the chapter demonstrates the application of findings for developing task-based language teaching modules, which consist of pedagogic tasks based on the two targets, and which increase in complexity. Included are a sample needs analysis questionnaire, consent form, and the model task-based language teaching module.
Coleman, H. (1992). Moving the goalposts: Project evaluation in practice. In J. C. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 222-246). Cambridge: Cambridge University Press.
keywords: EFL; Indonesia; university; ODA; British Council; conflict resolution; evaluator' s role ; political
annotation: Coleman provides an insider view on evaluation history and the related conflicts he experienced through developing the Key English Language Teaching (KELT) Project at Hasanuddin University, Indonesia. The project was funded by the Overseas Development Administration and overseen by the British Council. Coleman, as an incoming project officer not knowing beforehand the actual status of the project, encountered conflicting objectives and aims from different parties, each having a different expectation for evaluation. The original purpose of the project was to prepare materials and courses for pre-departure staff designated for overseas training. However, a needs analysis led to changes in emphasis towards (a) the attempt to modify undergraduates attitudes towards English, (b) the training of English teaching staff, and (c ) the eventual creation of a course called Risking Fun. Coleman examined "the extent to which [the Project] achieved the objectives which it laid down for itself after it had begun to operate" (pp. 236-237). When it comes to evaluating a project where different objectives are expected and evolve over time, it is clear that interpretations about the project may differ dramatically. It falls to the evaluator to identify various objectives as they change over time, and to document how and why they changed and how they were achieved.
Coombe, C., Al-Hamly, M., Davidson, P., & Troudi, S. (Eds.). (2007). Evaluating teacher effectiveness in ESL/EFL contexts. Ann Arbor, MI: The University of Michigan Press.
keywords: teacher evaluation; guidelines; standards; NCATE; performance indicators; self-evaluation; observation, portfolio; teacher training; ESL/EFL; K-12; university; adult education
annotation: The book introduces various teacher evaluation practices in English as a foreign/second language programs around the globe. The 15 chapters are organized into four sections: teacher evaluation standards (Part One), case studies (Part Two), research in teacher evaluation (Part Three), and resources and tools (Part Four).
Part One covers two projects on teacher/professional standards in-depth: (1) the development and implementation of professional standards for newly qualified teachers of English in Egypt, and (2) the use of TESOL/NCATE standards as a resource for teachers to build professional independence in South Asia.
Part Two of the book compiles six case studies of teacher evaluation practices and guiding principles across diverse contexts, from a comprehensive NCATE accreditation review of an education department in the U.S. to teacher appraisal programs taking teacher-driven and collaborative approaches in Canada and the United Arab Emirates. Many chapters include example performance indicators and standards and descriptions of methodologies taken in each context.
Part Three of the book showcases four research projects that investigate various issues and practices of teacher evaluation, including (1) an action research study of adult ESL teachers’ understanding and practices of mutiliteracy; (2) a survey study of university students’ perception of the usefulness, use, and focus of teacher evaluation; (3) a two-year study investigating development and training of teacher effectiveness from multiple perspectives (teachers, students, and trainers); (4) a survey study of teachers’ attitudes towards three teacher evaluation methods (student evaluations of teaching, classroom observations by the administrators, and teacher portfolios).
The final Part of the book examines tools for assessing teacher effectiveness, including a self-evaluation tool for pre-service teachers, a standards-based classroom observation tool, and a district-wide teaching portfolio assessment tool for formative purposes. Teacher trainers and administrators who conduct teacher evaluation will benefit from the book by reviewing the practical guidelines and ample examples of methods situated in various educational contexts.
Dassier, J. P., & Powell, W. (2001). Formative foreign language program evaluation: Dare to find out how good you really are. Dimension 2001: The odyssey continues. Selected proceedings of the 2001 Conference of the Southern Conference on Language Teaching, Birmingham, AL, 15-30.
keywords: US; university; foreign language; Spanish; French; language requirement; intrinsically motivated; questionnaire; focus group; proficiency; improvement; testing
annotation: Formative evaluation is a constructive process to "form the foundation for decision-making on central issues of curriculum development and thus more effectively address the kinds of issues raised in the introductory scenarios" (p. 93). Dassier and Powell report on a two-year formative evaluation study of college-level French and Spanish language (four semesters) requirement courses at the University of Southern Mississippi (USM). The evaluation was initiated by faculty members (I.e., non-mandated) to "provide information, substantiate or reject intuitions, assess program impact, quality, and effectiveness," (p.97) and to convince administrators (the chair of the department, the dean of their college, and others) of the value of such study. All students enrolled in second-semester and fourth-semester French and Spanish courses completed a questionnaire (demographic and attitudinal questions) and a College-Level Examination Program (CLEP) test. A spoken language proficiency test was also administered in the first year of the study, but not for the second year, due to cost. Students who took the CLEP test were randomly selected to volunteer in focus-group discussions to collect rich perspectives on issues that surround the required courses. Data suggested the need to: (a) create a system to identify true-beginners from false-beginners (require placement exams, review high school records, enhance advising, and/or create another 100-level class for true-beginners); (b) fill in the study gaps between high school, community college, and university language programs through cooperative articulation; (c ) communicate curricular objectives and philosophies between USM and the schools students came from (external articulation); (d) clarify and articulate goals and curricular framework with greater coordination and internal consistency among instructors. The lack of validity of the CLEP-test, due to students' low investment ("buy-in" effect), suggested the need to integrate an assessment mechanism with the existing curriculum for future on-going evaluation. This study demonstrates how intrinsically-motivated, on-going evaluation can lead to proposals for program improvements based on rich information; the article does not report on subsequent implementation of change.
Elder, C. (2009). Reconciling accountability and development needs in heritage language education: A communication challenge for the evaluation consultant. Language Teaching Research, 13(1), 15-33.
keywords: Australia; bilingual education; K-12; accountability; developmental evaluation; external; heritage language; Mandarin; Vietnamese; Arabic
annotation: Elder reflects on three evaluations of heritage language programs in government schools in Australia. She describes the circumstances of each evaluation, the challenges faced by the team members and their varying degrees of success. In lessons learned, she emphasizes the need to negotiate and clarify the following issues before the evaluation begins: resources and funds; the purpose, scope and audience of the evaluation; the roles of the evaluator and evaluands; and what will constitute evidence. She also stresses that evaluators need to be flexible and responsive to feedback from participants and stakeholders. Finally, Elder argues that while there is often tension between the accountability and developmental functions of evaluations, the two do not have to be mutually exclusive. Thus, external evaluators who have been hired primarily for accountability purposes can conduct evaluations that also contribute effectively to internal program development. Building productive relationships and maintaining open and effective communication with participants and stakeholders are two key strategies for simultaneously addressing both external accountability and internal development needs.
Elley, W. B. (1989). Tailoring the evaluation to fit the context. In R. K. Johnson (Ed.), The second language curriculum (pp. 270-285). Cambridge: Cambridge University Press.
keywords: pragmatic; method comparison; Comparative Analysis
annotation: Elley offers pragmatic suggestions at various decision points in planning and implementing evaluation. The planning stage involves identifying evaluator, purpose, intended outcomes, design, sample size, sampling, and instrumentation. In order tailor implementation, Elley recommends: (a) organize a committee to discuss the plans at each stage and ensure objectivity in data collection and analysis; (b) tailor time and effort according to the importance of information to be gained; (c ) determine aims through instructional materials and lesson plans in lieu of clearly defined aims; (d) conduct pre- and post-test comparison or survey large representative samples as a baseline before the program is introduced, if there are no comparison groups; (e) consider the homogeneity of the populations and samples involved; (f) survey the assigned school to look for schools with similar student composition, or survey for potential counterparts at the next highest grade level for later comparison; (g) determine the weight of skills tested, source of test materials, question types, length of the test, and forms of the test; (f) pilot and improve the test items. The timing of administration and the clarity of test specifications (procedure) should be considered, as well as cautious marking of the test after administration. Monitoring of experimental and control groups is also necessary to assure implementation. When analyzing results, it is also important to consider loss of cases, equating of groups, ‘ceiling effects' , differences between sub-groups or classes, and the behavior of extreme groups.
Eskey, D. E., Lacy, R., & Kraft, C. A. (1991). A novel approach to ESL program evaluation. In M. C. Pennington (Ed.), Building better English language programs: Perspectives on evaluation in ESL (pp. 36-53). Washington, DC: NAFSA.
keywords: US; university; ESL; reliability; face validity; academic success
annotation: The chapter reports on an evaluation at the American Language Institute (ALI), an ESL program at the University of Southern California (USC) that prepares students for academic English with a heavy emphasis on content. The students were mostly matriculated students who were placed into the ALI through a five-part program-specific examination. The program's advanced courses were tied to the needs of academic units (schools and departments at the institution). Therefore, both the ALI and related academic units were involved in the evaluation. The authors evaluated the effectiveness of the ALI by looking at the academic success of students who were released (exited) from the ALI, compared to other populations at the university. Further, the release criteria (based on writing skills) were validated by analyzing the relationship between ALI writing scores (on a nine-point scale) and GPA-based success rates. The results indicated that the ALI-released students were capable of academic success, while those who failed in the ALI but still enrolled in USC mostly dropped out (only 5 out of 55 successfully completed the program), supporting the validity of the ALI release criteria. This chapter demonstrates how the decision point of releasing students from an ESL program into mainstream university courses is an important criterion that calls for validation, and one that is apparently closely related to students' success in the university.
Fall, T., Adair-Hauck, G., Glisan, E. (2007). Assessing students’ oral proficiency: A case for online testing. Foreign Language Annals, 40(3), 377-406.
keywords: Oral proficiency assessment; online; ACTFL; French; German; Japanese; Spanish; large-scale; K-12
annotation: Fall, Adair-Hauck, and Glisan report on their longitudinal project of developing, implementing, and validating a K-12 online district-wide oral student proficiency assessment called Pittsburgh Public Schools Oral Ratings Assessment for Language Students (PPS ORALS) (Pittsburgh, Pennsylvania). The project was funded by the U.S. Department of Education, and the goal was to create accessible, feasible, and easy online testing aligned with the ACTFL Oral Proficiency Guidelines. Teachers across the district and language consultants collaborated to create tasks and assessment rubrics for the PPS ORALS, a process which empowered and equipped teachers with a greater understanding of proficiency-based instruction. Teachers’ involvement and rater training had a positive washback effect on the curriculum and classroom practices. In the Appendix, the authors include detailed examples of speaking tasks, a rubric, and a can-do check-list for different proficiency levels.
Fox, R. P. (1991). Evaluating the ESL program director. In M. C. Pennington (Ed.), Building better English language programs (pp. 228-240). Washington, DC: NAFSA.
keywords: ESL; program administration; directors
annotation: Fox describes previous literature on the evaluation of ESL program directors as "stress[ing] the concept of accountability through performance evaluation, professional development and personal growth, and reward for outstanding performance" (p. 235). He then questions how to conduct and who will conduct the evaluation of an ESL program director. Fox suggests forming a committee (e.g., the immediate dean, vice-provost, faculty, staff, sponsors, and students) and setting performance criteria. He cites 12 types of performance criteria (problem analysis, judgment, organizational ability, decisiveness, leadership, sensitivity, range of interests, personal motivation, educational values, stress, oral communication skills, and written communication skills) as a basis for developing the evaluation instrument. After responses are collected, analyzed, and reported, "one of the most important results of the evaluation is the development of an improvement plan by the ESL program director, which should form the basis of interim informal evaluations" (p. 238). Thus, evaluation of the director contributes additionally to further development of the program.
Gattullo, F. (2000). Formative assessment in primary (elementary) ELT classes: An Italian case study. Language Testing, 17(2), 278-288.
keywords: Italy; elementary; EFL; formative; implementation; teachers; interview; assessment
annotation: This work-in-progress case study of an Italian elementary school (3rd and 4th grade) takes a discourse analytic perspective on formative classroom assessment practices of teachers. Within 150 hours of classroom interaction data, assessment events were identified, transcribed, and coded into nine teacher feedback categories: questioning/eliciting, correcting, judging, rewarding, observing process, examining product, clarification request, task criteria, and meta-cognitive questioning (in rank order). Gattullo emphasizes the value of the meta-cognitive questioning (lowest rank in data) which will make students better articulate their understanding and thus second, a context inventory should be conducted to make decisions on what issues to prioritize and how to carry out the program evaluation (covering factors such as availability of comparison groups, reliable/valid language measures, evaluation expertise, and instructional materials and resources; background of students and staff; student selection process, size, intensity, perspectives and purpose of the program; timing of evaluation; and the social and political climate of the program). Third, developing a preliminary thematic framework will also clarify the conceptual framework of the program, what the salient issues are, and what aspect is going to be evaluated. Fourth, a data collection design and system has to be selected based on the questions that need to/can be answered; feasibility information from the context inventory also may limit the methodology (a useful decision making chart for data collection design/system is provided as an example). Fifth, collect data based on the clear purpose of the design; data collection may be eclectic since new questions and issues can emerge and the purpose of the evaluation can evolve with the program. Sixth, ideally, analyze data with a "multiple analysis strategy [which] can strengthen the evaluation by avoiding the possibility of bias associated with any particular technique" (p. 36). Finally, formulate and tailor the evaluation report for a particular audience.
Gorsuch, G. (2009). Investigating second language learner self-efficacy and future expectancy of second language use for high-stakes program evaluation. Foreign Language Annals, 42(3), 505-540.
keywords: ; university; accreditation; program development; self-efficacy; quantitative; questionnaire; program theory; foreign language; process; case study
annotation: This article describes a university foreign language program evaluation focusing on student self-efficacy. The evaluation was part of a larger, summative evaluation being carried out for accreditation purposes. It was developed by a team of second language faculty to evaluate one of the department’s core competency statements, “Students of Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, and Spanish will demonstrate confidence in using the second language in their language classrooms, and their future expectancies of their ability to use the second language in real-life contexts” (p. 506). The team developed a Likert-style questionnaire for students based on literature on self-efficacy and faculty input regarding outcomes and expectations. The questionnaire was administered to fourth semester students in all languages. Quantitative analysis was performed to determine the extent of students’ perceived self-efficacy for various outcomes, and the team developed specific suggestions for improvement based on the results. Gorsuch notes that the process of developing the evaluation was as beneficial as the findings because it required faculty members from different language divisions to examine their assumptions about what students should be able to do and articulate a shared program theory. Furthermore, the evaluation successfully served two purposes: accountability and program development.
Gottlieb, M., & Nguyen, D. (2007). Assessment and accountability in language education programs: A guide for administrators and teachers. Philadelphia: Caslon Publishing.
keywords: accountability; assessment; portfolio; English; bilingual education; K-12
annotation: To begin, Gottlieb and Nguyen review national and local perspectives on accountability and assessment surrounding English language learner education (I.e., dual-language, transitional-bilingual, and ESL programs). The authors propose an assessment framework called Balanced Assessment and Accountability System, Inclusive and Comprehensive (BASIC), a model implemented at the Schaumburg, Illinois, School District 54. Their model is an assessment and accountability framework that links state, district, program, and classroom-level assessments with curriculum and instruction. In planning for assessment, the model emphasizes the consideration of the effect of internal and external contextual factors on program design and assessment practices. Such factors include learning goals, benchmarks and standards, characteristics of the program constituents, and program mission and vision. Once contextual factors are identified and learning goals are listed, the next step is to match the purposes and types of assessment tools to each goal. Gottlieb and Nguyen suggest the use of student portfolios, called “pivotal portfolios,” which involve systematic collection of different types of student learning and achievement data based on agreed upon common assessment tools. The book showcases examples of the uses of the “pivotal portfolio” for classroom and program decision-making in dual language programs and transitional bilingual programs. The authors point out that “pivotal portfolio” data can be used not only for student assessment but also as a response to authentic (I.e., locally-relevant) accountability pressures and to improve instruction and student learning. The book includes worksheets and checklists to help educators design a contextualized assessment framework.
Grosse, C. U. (2004). Competitive advantage of foreign languages and cultural knowledge. Modern Language Journal, 88(3), 351-373.
keywords: US; MBA; foreign language; culture; alumni survey
annotation: Grosse describes how graduates from the Thunderbird business school, which requires a minimum of four semesters of foreign language study, view the advantage of their knowledge in foreign language and culture in the international business community. An online web survey was distributed through email to 2500 alumni who graduated between 1970 and 2002, and 581 responded. Over 80% of the respondents said that foreign language skills and cultural knowledge gave a competitive advantage, suggesting the value of foreign language and cultural knowledge in business, despite the mismatch between the languages studied at Thunderbird and the languages they needed in their workplaces. The article addresses the importance of foreign language in an MBA program in general. It also shows how graduates of a program can prove to be an important source of expertise for illuminating how learning in the program can be applied in the real world.
Hajjaj, A., & Al-Najjar, B. (1989). ESL program evaluation: Realities and perspectives. In J. E. Alatis (Ed.), Georgetown University Round Table on Languages and Linguistics, 1989 (pp. 133-141). Washington, DC: Georgetown University Press.
keywords: Kuwait; EFL; university; questionnaire; framework
annotation: Hajjaj and Al-Najjar describe issues that arose in ESL program evaluation at Kuwait University (KU). They characterize ESL program evaluation in the 1980s by introducing the emerging notions of process and product evaluation, program-fair evaluation, and shared or negotiated evaluation. Further, they note that considerable attention was beginning to be paid to evaluating affective and cognitive aspects of learning, as well as inclusion of evaluation as an integral part of curriculum development. The authors present the results of survey research on the realities of ESL evaluation in Arabian universities, and they set out a framework for future ESL program evaluation there, including: (a) an overall comprehensive evaluation plan (not only testing) should be developed and communicated among stakeholders; (b) by monitoring progress throughout the program, evaluation will be an integral part of the learning process; (c ) an evaluation should gather appropriate and relevant data; (d) evaluation should be action-oriented, feeding back into program development; and (e) the validity of evaluation study should be clarified.
Hargreaves, P. (1989). DES-IMPL-EVALU-IGN: An evaluator's checklist. In R. K. Johnson (Ed.), The second language curriculum (pp. 35-47). Cambridge: Cambridge University Press.
keywords: checklist; method; model; purpose; agent; curriculum development; theoretical
annotation: Hargreaves argues that approaches to curriculum planning have often treated "design, implementation, and evaluation" as a linear process, positing evaluation as a post-hoc matter. He proposes a cyclical integrated view and illustrates a checklist of twelve mutually dependent factors that can be utilized when planning an evaluation: target audience (non-specialist / specialist), purpose (formative / summative), focus (direct / indirect), criteria (global / relative), method (a priori / empirical), means and instrument (a priori / empirical), agents (internal / external), resources (staffing / funding), time factors (timing and time scales), findings (nature and status of findings), presentation of results (formats), and follow-up (action). Integration of evaluation to curriculum is essential for all stages of curriculum development.
Harklau, L., Norwood, R. (2005). Negotiating researcher roles in ethnographic program evaluation: A postmodern lens. Anthropology and Education Quarterly, 36(3), 278-288.
keywords: ethnography; evaluator role; postmodernism; stakeholder; secondary education; minority learners
annotation: Harklau and Norwood discuss how evaluators’ roles and reflexivity are shaped by and relative to institutional and societal discourses. In the official role of external evaluators, Harklau and Norwood conducted an ethnographic evaluation of a month-long summer college readiness program (including ESL and math among other subjects) for underrepresented middle school students. The researchers reflect on the fluid multiple roles they were positioned in by the stakeholders of the program: as insiders, as outsiders, as colleagues, as teaching staff, as program benefactors, and even as ornamental researchers. Throughout the study, Harklau and Norwood negotiated their evaluator roles and power with the stakeholders of the program and occasionally resisted their positionality. Taking a postmodern perspective, they suggest that: (a) evaluation is a performative act, in particular recognizing that “science is a representation, not a transparent reality” (p. 285); and (b) policy makers should acknowledge multiple ways of knowing, including the value of ethnography as an evaluation method.
Harris, J. (1990). The second language programme-evaluation literature: Accommodating experimental and multifaceted approaches. Language, Culture, and Curriculum, 3(1), 83-92.
keywords: Ireland; Irish; bilingual education; elementary school; experimental; multi-faceted; observation; questionnaire
annotation: Harris discusses how the purpose of evaluation differs between two distinct evaluative approaches, experimental and multi-faceted. The experimental approach is often used for theory-oriented evaluation, which generates theory, tests hypotheses, and seeks causal relationships, rather than making practical decisions about an individual program. Many bilingual education studies carried out in Canada are theoretically-oriented evaluations that respond to policy shaping purposes. The multi-faceted approach focuses on short-term, applied, practical, decision making, though it may eventually shape policy as well. Harris presents an example of a nation-wide evaluation of Irish-language programs in Irish primary schools to illustrate how a long-term study can take both approaches, depending on the focus. The evaluation began as a test-development project, with an "output-oriented quasi-experimental approach" (p.87) for decision-making and policy orientation. However, a thorough investigation of expectations of schools and a greater focus on process issues (classroom observation and student, teacher, and parent questionnaire) emerged, reflecting the pedagogic concerns of participants. In a follow-up evaluation, a hypothesis testing study was carried out to verify "the relationship between general ability, amount of naturalistic use of Irish in school and achievement in different aspects of spoken Irish" (pp. 90-91). The accommodation of both approaches can bring greater depth and generalizability to program evaluation.
Harris, J. (2009). Late-stage refocusing of Irish-language programme evaluation: Maximizing the potential for productive debate and remediation. Language Teaching Research, 13(1), 55-76.
keywords: Ireland; Irish language; national language; language policy; context; political; data analysis; case study; criterion-referenced; assessment; K-12
annotation: Harris discusses a series of Irish language program evaluations in Ireland and their role in the public debate on language policy and program development. Focusing on two studies in particular he describes how adjustments were made during the evaluation process to achieve a more comprehensive understanding of the findings and prevent their misuse or misinterpretation. For example, when faced with potentially problematic findings, the evaluation team performed a more thorough analysis of the data and looked at contextual factors that helped explain the findings. By clarifying and contextualizing the data, the evaluators were able to produce reports that contributed constructively to the public debate on Irish language education policy. Harris emphasizes the political nature of evaluation and calls on evaluators to consider the political implications of their work and take responsibility for clarifying findings so that they are not misinterpreted.
Hedge, T. (1998). Managing developmental evaluation activities in teacher education: Empowering teachers in a new mode of learning. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing evaluation and innovation in language teaching: Building bridges (pp. 132-158). London: Longman.
keywords: university; teacher training; self-evaluation; developmental evaluation
annotation: Hedge highlights the outcomes of developmental evaluation activities (also provided as Appendices) in a 50-hour core teacher education course in a postgraduate programme. Two case studies show how the use of self-evaluation activities assists teachers in developing awareness about managing group work through collaborative tasks. "The reflective investigation and developmental evaluation activity [through experiential learning] can build strong motivation among participants on teacher education courses" (p. 150).
Heining-Boynton, A. L. (1990). The development and testing of the FLES program evaluation inventory. Modern Language Journal, 74(4), 432-439.
keywords: US; elementary school; FLES; rating scales; testing
annotation: Heining-Boynton reports on the development and testing of a multifaceted FLES program evaluation instrument, a survey distributed to FLES teachers, principals, administrators, students, elementary classroom teachers, and parents. The instrument covered issues that the FLES program faced historically, as well as concerns elementary foreign language programs had at the time: teacher qualifications, goals and objectives, pedagogy, articulation, homework and grades, parent support, FLES teacher acceptance by colleagues, workload, at-risk students, and student satisfaction.
Hill, Y.Z., & Tschudi, S. (2008). A utilization-focused approach to the evaluation of a web-based hybrid conversational Mandarin program in a North American university. Teaching English in China: CELEA Journal, 31(5), 37-54.
keywords: US; university; Mandarin; utilization focused; web-based; hybrid; improvement; formative; questionnaire; interview; participatory; case study
annotation: Hill and Tschudi recount their experience conducting a utilization-focused evaluation of a hybrid, conversational Mandarin course. As the course instructor and tutor, the authors identified themselves as the primary intended users. Because the course was new, they sought to understand the various components of the course, determine how well the course was functioning, and identify areas for improvement. The evaluation was divided into two stages. During the first stage, the evaluators collected data using questionnaires and student interviews. Using the data, the evaluators made changes to the next semester’s course, and the second stage of the evaluation was conducted at the end of that semester. Based on the findings, the evaluators decided to increase the amount of natural conversation practice, include more interesting topics for students and explicitly address online learning strategies. The authors note that while Tschudi began the project primarily as a stakeholder, because of the participatory nature of the evaluation he gradually took on the role of co-evaluator.
Horwitz, E. K. (1985). Formative evaluation of an experimental foreign-language class. Canadian Modern Language Review, 42(1), 83-90.
keywords: US; university; French; classroom observation; interview; curriculum; formative; innovation
annotation: Horwitz illustrates the formative evaluation of an experimental French as a foreign language class with a communicative focus at the University of Illinois Urbana-Champaign. The evaluation was conducted by an outside evaluator using structured interviews and systematic classroom observation. An observational coding system captured how many turns each student took (response, question, or comment), whether the utterance was spontaneous or not, whether the utterance was in English or the target language, and whether the utterance was long or short. By comparing the characteristics of students' communicative behavior in one activity with another, the teacher can modify the new activity to effectively engage students. Besides quantitative aspect of utterances, the coding system can be modified to examine the quality of the utterance, such as to include types of feedback and error modification. The results of Horwitz' s use of this observational coding system were triangulated with student and teacher interviews. As a result of the formative evaluation, the instructor incorporated student initiated topics, adapted grammar lessons (due to students' concerns about the disadvantage of a communicative approach for the second semester), and confronted student's habitual off-topic comments.
Horwitz argues that, when a teacher is trying to implement a new approach, formative evaluation can provide useful and timely feedback (if the observation system is not too time consuming or labor intensive). In particular, it can help to monitor both teacher and students' classroom behavior, inform instructional modifications, adjust pedagogical tasks, and last but not least, raise unanticipated issues in time to try to solve them.
Houston, T. (2005). Outcomes assessment for beginning and intermediate Spanish: One program’s process and results. Foreign Language Annals, 38(3), 366-376.
keywords: : US; university; Spanish; foreign language; outcomes assessment; ACTFL; standards; oral proficiency; survey; placement exam; portfolio; case study
annotation: Houston describes a university Spanish language program’s process for articulating and assessing learning outcomes. Using the ACTFL Guidelines and the Standards for Foreign Language Learning in the 21st Century as rough guides, the program developed both proficiency goals and general program goals. To assess outcomes, the program looked at student gains on the placement exam, student satisfaction surveys, and oral proficiency interviews and tasks. The proficiency assessments indicated that the program was generally successful, although there was some conflicting data on students’ grammar skills. However, the student satisfaction surveys showed that students felt some general program goals were not met. The program used this feedback to make improvements.
Hudson, T. D. (1989). Mastery decisions in program evaluation. In R. K. Johnson (Ed.), The second language curriculum (pp. 259-269). Cambridge: Cambridge University Press.
keywords: criterion-referenced; testing; mastery decision
annotation: Hudson addresses the use of criterion-referenced measurement (CRM) to assess student performance (mastery or non-mastery) in relation to program goals for program evaluation purposes. Mastery testing establishes the absolute standing of students' performance against the objectives of instruction. Hudson discusses issues of reliability/dependability of CRM, in which consistency of decisions can be resolved through various statistical approaches. He also suggests that content-based or Contrasting Groups methods may inform the validity of decision standards in CRM. The process of developing CRM through input from instructors, material writers, and administrators can strengthen the curriculum by calling for a rationalization of instructional goals and methods. Further, by analyzing the results of CRM, "the evaluator can determine the extent to which the program is I) producing the desired results and ii) realistic in its goals" (p. 265). A major difference between Bachman's (1989) and Hudson' s approach to criterion-reference testing is whether testing involves reference to other programs or to the program itself. The choice of approach may depend on whether there is a strong need for generalization in evaluation outcomes (e.g., for accountability purposes).
Jacobson, P. L. H. (1982). Using evaluation to improve foreign language education. Modern Language Journal, 66, 284-291.
keywords: improvement; accountability; triangulation; personal factor; political; utilization
annotation: Jacobson describes constraints on the availability of valid evaluative information, shows how to aid foreign language education through program evaluation, and provides suggestions for improving the utilization of evaluation information. Jacobson refers to summative evaluation as "the most authoritative and defensible information," while ongoing formative evaluation is "an integral part of a foreign language program is a sine qua non for providing valid data to decision makers" (p. 288-289). She advocates the use of evaluability/evaluative assessment (to determine the likelihood of an evaluation success), needs assessment (to determine the gap between the desired status and the current status), and implemental evaluation (to determine the reality of the implementation process). When it comes to utilization of evaluation, personal factors (responsibility, leadership, enthusiasm, determination, etc.), political climate, and format of the report all interact to determine whether the evaluative outcomes will be utilized by the stakeholders of the program.
Jenks, F. L. (1991). Designing and assessing the efficacy of ESL promotional materials. In M. C. Pennington (Ed.), Building better English language programs: Perspectives on evaluation in ESL (pp. 172-188). Washington, DC: NAFSA.
keywords: ESL; promotional materials; administration
annotation: Jenks discusses the purposes and the effectiveness of ESL program promotional materials, such as videos, brochures, program advertisements, posters, newspapers and newsletters. English language programs that target international clients need to seek ways to improve their promotional strategy/material through constant formative assessment that seeks to balance cost and effectiveness. The effectiveness of each type of promotional material can be evaluated by utilizing the following strategy: (a) place a code in the application page of the brochure to track distribution; (b) chart the number and country source of the returned preprinted forms; (c ) attach a tear-off pad or postal card to the poster to later tally the response; (d) tally the number of inquiries through newspaper and newsletters (author notes that the effectiveness of video is difficult to assess). Not covered here are online advertisements and web pages, which were not prominent at the time of the publication.
Johnson, R. K. (Ed.) (1989). The second language curriculum. Cambridge: Cambridge University Press.
keywords: curriculum planning; ends/means specification; program implementation; classroom implementation; faculty development
annotation: This book offers a collection of papers arguing for a cohesive curriculum and emphasizing the interdependence of various stages (curriculum planning, specification of ends and means, program implementation, and classroom implementation) throughout the development and evaluation process. Evaluation is understood as "necessary and integral part of each and all of the stages" (p. xiii). The chapters in the book cover all aspects of curriculum, and some focus on evaluation. Chapters include: "A decision-making framework for the coherent language curriculum" (Chapter 1: Johnson); "Syllabus design, curriculum development and polity determination" (Chapter 2: Rodgers), "DES-IMPL-EVALU-IGN: an evaluator's checklist" (Chapter 3: Hargreaves); "Needs Assessment in Language Programming: From Theory to Practice" (Chapter 4: Berwick); "The Role of Needs Analysis in Adult ESL Programme Design" (Chapter 5: Brindley); "Service English Programme Design and Opportunity Cost" (Chapter 6: Swales); "Faculty Development for Language Programs" (Chapter 7: Pennington); "The Evolution of a Teacher Training Programme" (Chapter 8: Breen & Candlin, Dam, & Gabrielsen); "Appropriate Design: The Internal Organisation of Course Units" (Chapter 9: Low); "Beyond Language Learning: Perspectives on Materials Design" (Chapter 10: Littlejohn & Windeatt); "Hidden Agendas: The Role of the Learner in Programme Implementation" (Chapter 11: Nunan); "The Evaluation Cycle for Language Learning Tasks" (Chapter 12: Breen); "Seeing the Wood AND the Trees: Some Thoughts on Language Teaching Analysis" (Chapter 13: Stern); "Language Program Evaluation: A Synthesis of Existing Possibilities" (Chapter 14: Brown); "The Development and Use of Criterion-Referenced Tests of Language Ability in Language Program Evaluation" (Chapter 15: Bachman); "Mastery Decisions in Program Evaluation" (Chapter 16: Hudson); and "Tailoring the Evaluation to Fit the Context" (Chapter 17: Elley). This collection, and its many examples, will be of key interest to anyone who is concerned with developing the various components of language programs. Only chapters that cover the issues of language program evaluation are annotated here.
Karava-Doukas, K. (1998). Evaluating the implementation of educational innovations: Lessons from the past. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing evaluation and innovation in language teaching: Building bridges (pp. 25-50). London: Longman.
keywords: Greece; secondary; innovation; overview; trends; methods; participatory evaluation; communicative language teaching; EFL
annotation: Karava-Doukas examines the implementation issues associated with language program innovation, and she provides an example implementation study of an English teaching curriculum innovation in secondary schools in Greece. Some of the factors that influence successful educational innovations include: (a) teacher' s attitudes and beliefs towards education (attitude clarification and refinement); (b) the clear articulation of an innovation proposal (specified goals and means in non-technical terms); (c ) teacher training (systematic, ongoing, and long-term training which clarifies teacher beliefs, makes teachers innovators, and accommodates teachers' existing knowledge); (d) communications and support (administration and peer support); and (e) the compatibility of the innovation with contingencies and constraints of the classroom and wider educational contexts (time, resources, organizational constraints, teachers' perception of needs, and teaching style). The Greek EFL case study used classroom observation, questionnaires, interviews, and reports of classroom practice to reveal the implementation of innovation. Key findings included the gap between an intended innovative curriculum and actual classroom practice, as well as the disjuncture between a communicative approach and teachers' beliefs about and understanding of the approach.
Kennedy, C. (1988). Evaluation of the management of change in ELT projects. Applied Linguistics, 9, 329-342.
keywords: innovation; management; stakeholders; theoretical; process
annotation: Kennedy addresses the notion of innovation theory in program evaluation, considering a program as a systematic organization where various factors interact. He suggests "[In] evaluating any project we should be concerned not only to evaluate the outcome of the project& but the process of innovation itself, the stages it passes through, from the identification of a problem to the selection of the innovation and its final incorporation, acceptance, and diffusion" (p. 329). To create an innovative change in the curriculum, the program manager/developer has to: (a) understand the underlying attitudes and beliefs of the program stakeholders; (b) monitor and adjust the process of change; (c ) investigate the relationship of the process to the outcomes; (d) incorporate any information found that can make the change; and (e) "return to projects some time & to see whether the change has been incorporated to the system" (p. 330). Kennedy also emphasizes that the innovations need to be contextually adaptable to local conditions, and that all participants be involved/consulted so that they see what benefits would be gained by the innovation. In classroom innovation, cultural, political, administrative, educational, and institutional factors interrelate and cannot be ignored. A good match in terms of feasibility, acceptability, and relevance between innovation and the existing program can generate acceptance from the stakeholders. Kennedy concludes with questions that can be asked throughout the innovation process for evaluating change management.
Kiely, R. (1998). Programme evaluation by teachers: Issues of policy and practice. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing evaluation and innovation in language teaching: Building bridges (pp. 78-104). London: Longman.
keywords: UK; EAP; Europe; university; overview; methods; participatory evaluation; ethnography
annotation: Kieley bridges ethnography with program evaluation. He presents a case study evaluation of a 12-week EAP program in a British university, for both improvement and accountability purposes. The evaluation is based on ethnographic methodologies (interviews, classroom observation, field notes, questionnaires, structured discussions, and program documents). Kieley also seeks to inform the relationship between evaluation and pedagogy through the use of evaluation dialogues between the teacher and the students in a classroom. This approach transforms both students and teachers into active participants in a democratic classroom evaluation process.
Kiely, R. (2006). Evaluation, innovation, and ownership in language programs. Modern Language Journal, 90(4), 597-601.
keywords: assessment; ownership; marketization; EFL; university; UK
annotation: Kiely stresses that engagement in meaningful in-depth evaluation depends on educators’ perceptions of their departmental culture, disciplinary orientations, and their own professional roles and expertise as teachers and academics. As an illustrative example, he describes an internally driven program evaluation and innovation studies of English as a Foreign Language and English for Academic Purposes programs at Thames Valley University in the UK. Three major motivations for conducting the evaluation studies were faculty members’: (1) awareness of the need to respond to the changing market situations in education (e.g., enrollment, needs); (2) understanding of assessment as a central activity to inform curriculum and internal and external stakeholders about the program; (3) buy-in for conducting research on student learning. Kiely concludes that an educator’s sense of ownership of the program is the key factor in generating cycles of program evaluation, development, and innovation.
Kiely, R. (2009). Small answers to the big question: Learning from language programme evaluation. Language Teaching Research, 13(1), 99-116.
keywords: UK; university; EAP; ESL; learning; program development; context; innovation; ethnography; case study
annotation: Kiely’s article explores the concept of learning from program evaluation. He discusses various issues that can hinder learning, including multiple and sometimes conflicting purposes for evaluation, lack of interaction and collaboration between stakeholders, and lack of clarity about learning from versus learning through evaluation. He also traces the historical trends of program evaluation and their limitations with regard to their contributions to learning. Kiely advocates for evaluations that focus more on understanding the contextual features of programs. Three contextual features that he views as particularly important are innovation, teachers at work, and the quality of the student learning experience. To demonstrate the impacts of the various issues and contextual features on evaluation, he analyzes an evaluation of learning materials being used in an EAP program at a British university. His analysis includes an ethnographic study of the evaluation which reveals not just what was learned from the evaluation, but also learning opportunities that were missed. Finally, Kiely concludes that to maximize learning, program evaluation needs to become “a socially-situated cycle of enquiry, dialogue, and action” (p. 99).
Kiely, R., & Rea-Dickins, P. (2005). Program evaluation in
language education. New York: Palgrave Macmillan.
keywords: history; case study; method; framework; teacher-led; management-led; impact
analysis; standard; development; management; SLA; research; ESL; EAP; immersion; Africa; Asia; US; Europe; Canada; Australia
annotation: This book introduces principles and procedures that may be adapted to a variety of language program evaluation contexts. It consists of four sections: background (Part 1), case studies (Part 2), framework (Part 3), and resources (Part 4). In Part 1, the nature of program evaluation (chapter 1), its history and developments in design, methodology (chapter 2), context, and use (chapter 3), and theory development in language learning (chapter 4) are discussed. Kieley and Rea-Dickins argue that Evaluation is about the relationships between different program components, the procedures and epistemologies developed by the people involved in programs, and the processes and outcomes which are used to show the value of a program – accountability – and enhance this value – development (p. 5). Program evaluation is also complex, and the authors address five challenges: (a) the clarification of the purpose of evaluation, (b) the articulation and understanding of stakeholders' values and factors that mediate them, (c ) the identification of evaluation criteria, (d) collection of valid data, and (e) assurance of evaluation use for program development, management, and research advancement. In Part 2, seven case studies illustrate program/project context, aim, and scope, as well as evaluation design, procedures, sample instruments, and implementation. These cases enable readers to understand the relationship between specific programs and evaluation practices, and they provide useful bases for relating evaluations to other language program contexts. The case studies vary widely, including: (a) a nation-wide study of teachers' English language skills in an EAL/ESL context (chapter 5); (b) a multinational evaluation of the language component of the Science Across Europe project in secondary schools (chapter 6); (c ) a large-scale evaluation of the contribution of native speaker teachers in secondary schools in Hong Kong (chapter 7); (d) a multi-site evaluation of foreign language teaching pilot programs in primary education in Ireland (chapter 8); (e) a quality management evaluation (chapter 9) and an evaluation of students' experiences (chapter 10) in an EAP program at a British university; (f) a document evaluation of national and state assessment standards for EAP learners in different contexts (Canada, USA, and Australia) (chapter 11); and (g) an impact evaluation by external evaluators of the Centre for Canadian Language Benchmarks (chapter 11). These case descriptions enable understanding of why each evaluation study was conducted (evaluation for accountability, learning, sense-making, curriculum development) and how analytic frameworks were adopted and applied (ethnographic, survey, classroom observation, document evaluation). In addition to the case studies, chapter 12 provides a comprehensive discussion of how stakeholder participation can be sought in evaluation. Part 3 examines three different types of impetuses for evaluation: large-scale evaluation (chapter 13), teacher-led evaluation (chapter 14), and management-led evaluation (chapter 15). When conducting a large-scale evaluation, understanding the construct, design (validity, questions, procedures, analysis), and implementation (capacity, constraints, and ethical issues) are especially important. Teacher-led evaluation is effective when: (a) it is linked with pedagogic concerns; (b) teachers perceive a need for change and/or perceive evaluation as opportunity for improvement; and (c ) there is sufficient time and teachers are involved in quality management. Sample teacher-led evaluation projects are described, focusing on a course textbook, a curriculum innovation, and a teacher education research methods course. Particularly relevant for language programs in the US, who face an evaluation mandate from external accreditation bodies, is the discussion of management-led evaluation. The involvement of management may develop and facilitate the use of links between program evaluation and management processes of performance assessment and professional development (p. 255). Sample evaluations of an EAP program in South Africa (a ten-stage procedure), and of management and use of resource centers in Eastern and Central Europe by the British Council, are used to illustrate example frameworks here. The last section (Part 4) provides a list of resources that may be useful for extending knowledge about evaluation in general, including books, journals, professional associations, research ethics, electronic mailing lists, and internet resources.
Lett, J. A. (2005). Foreign language needs assessment in the US military. In M. H. Long (Ed.), Second language needs analysis (pp. 105-124). Cambridge: Cambridge University Press.
keywords: US; Defense Language Institute; foreign language; needs analysis; proficiency; subject matter experts; reliability; validity
annotation: This chapter reports on how the Defense Language Institute Foreign Language Center (DLIFLC), in the US, has been conducting a systematic foreign language needs assessment for setting foreign language proficiency requirements of different career fields. The purpose is to: (a) assure that funds are well spent for educating students to an appropriate level; (b) manage military linguists (identify how they are deployed, assign appropriate tasks, set criteria for keeping the job, etc.); and (c ) identify what proficiency level is required to accomplish distinct military jobs. Lett describes analysis procedures in detail and discusses issues of reliability and validity. In particular, he reports on how career group subject matter experts (SMEs) discussed job tasks, conditions, and standards. He also proposes two different methods for understanding reliability of judgments about proficiency and task requirements (see below). In addressing validity, he suggests (a) assuring that the task list is reflecting the SMEs' understanding of their career fields, and (b) comparing the task requirements (high and low proficiency tasks) with supervisors' perceptions of task performance by high and low proficiency individuals. The reliability and validity resolutions for task and language specification can be utilized for any needs analysis study, in addition to verifying interpretations through multiple sources and methods. Modified split-half procedure: Two groups of SMEs each discuss the tasks and tasks' language requirements. Later they compare the results and reach consensus.Surrogate or partial test-retest design: Show a video-taped discussion to another group of participants and compare the proficiency judgment between groups.
Liskin-Gasparro, J. E. (1995). Practical approaches to outcomes assessment: The undergraduate major in foreign languages and literatures. ADFL Bulletin, 26(2), 21-27.
keywords: US; foreign language; outcomes assessment; portfolio; oral proficiency test; interview; university
annotation: Liskin-Gasparro emphasizes using outcomes assessment to incorporate a reflective component into the program and to give directions for change and improvement. She describes the development and use of instruments for assessing students' content knowledge, attitudes about the program, postgraduate activities, and linguistic knowledge, skills, and performance. She also notes how the intention to measure both the growth/process of student learning and specific skills has introduced the use of portfolios, which can be aligned with departmental objectives. Two case studies of the development of foreign language outcomes assessment are presented. The University of Iowa (department of Spanish and Portuguese), utilized a variation of the simulated oral performance assessment (later replaced by a Spanish speaking test), a writing assessment (later replaced by a portfolio), an exit interview, and a questionnaire to enrolled students and alumni to reveal learners' needs. The second case study at Bates College (Spanish and French sections of the department of classical and Romance languages and literatures) was an internally-motivated assessment plan using portfolios. The appendices include an exit interview protocol from the University of Iowa and an example portfolio program from Bates College.
Llosa, L., & Slayton, J. (2009). Using program evaluation to inform and improve the education of young English language learners in US schools. Language Teaching Research, 13(1), 35-54.
keywords: US; K-12; reading; ESL; outcomes; quasi-experimental; multiple methods; context; data collection; data analysis; NCLB; political
annotation: Llosa and Slayton describe their evaluation of the Waterford Early Reading Program, a reading intervention program for at-risk kindergarten and first grade students in an urban school district in California. The evaluation sought to (a) determine the effectiveness of the program on reading ability, (b) understand how the program was being implemented, and (c) look specifically at its effectiveness and implementation for English language learners. Llosa and Slayton provide a detailed description of the context, the quasi-experimental design, and the findings. They then discuss conditions and strategies that allowed them to successfully carry out the evaluation and provide results that were useful to various stakeholders and decision makers. They stress that in a difficult political environment, evaluators can make their studies more useful by expanding the focus beyond outcomes measurement. Instead, by utilizing multiple methods of data collection, incorporating qualitative data, and thoroughly investigating the context as well as the outcomes, evaluators can better understand the reasons for certain outcomes and frame findings and recommendations in a way that they can be appropriately acted upon.
Long, M. H. (1984). Process and product in ESL program evaluation. TESOL Quarterly, 18, 409-425.
keywords: product; process; summative; formative; classroom observation; second language acquisition
annotation: Long, a second language acquisition researcher, draws upon a classroom research perspective to inform language program evaluation. He describes the necessity of looking at "what is actually going on in classrooms as opposed to what is thought to be going on" (p. 422) to supplement the evaluation of program/classroom products. Many of the comparative studies on teaching methodology in the 1970s and 1980s tended to focus on student outcomes alone, and to not distinguish between evaluation and assessment. However, "the product evaluations cannot distinguish among the many possible explanations for the results they obtain because they focus on the product of a program while ignoring the process by which that product came about" (p. 413). Long defines process evaluation as "the systematic observation of classroom behavior with reference to the theory of (second) language development which underlies the program being evaluated" (p. 415), and he distinguishes it from formative evaluation. Long' s process evaluation focuses the evaluation on illuminating real classroom behaviors in order to understand the program more comprehensively.
Long, M. H. (2005a). Methodological issues in learner needs analysis. In M. H. Long (Ed.), Second language needs analysis (pp. 19-76). Cambridge: Cambridge University Press.
keywords: needs analysis; sources; methods; triangulation; outsider; insider; expert; non-expert; validity; reliability; questionnaire; interview; journal log; language audit; ethnographic; observation; tests
annotation: Long provides a comprehensive overview of methodological issues in needs analysis (NA), focusing on the sources of information, methodologies for eliciting information, and source × method combinations for improved interpretations. Some of the sources of information used in previous NA studies include learners, teachers, applied linguists, domain experts, (un)published literature, and other sources. Long also discusses advantages and disadvantages of various methodologies including expert and non-expert intuitions, interviews, questionnaire surveys, language audits, ethnographic methods, observations, journal logs, and tests. He then presents an example NA study on identifying tasks and language demands of airline flight attendants, revealing the importance of collecting data from various sources (e.g., flight attendants in the study), and not relying solely on experts' (applied linguists in the study) intuitions. Each data collection method (written introspections, unstructured interviews, and surreptitious recordings of target discourse) showed different advantages (time, labor, specificity, and conciseness) in relation to different types of data (task types, lexis, and language use). Interactions of source and method were also found: written information best described the tasks and language; insiders compared with outsiders provided richer information on tasks and technical terms; written introspection was a more efficient way to collect information on tasks and language use than were unstructured interviews, but this applied for outsiders and not for insiders; and surreptitious recordings were better for baseline data on language use. More study on NA research itself in different contexts is called for to understand the effective use of sources and methodologies, and to attend to reliability and validity of information/interpretations.
Long, M. H. (Ed.) (2005b). Second language needs analysis. Cambridge: Cambridge University Press.
keywords: needs analysis; validity; reliability; SLA; rationale; methodology; case studies
annotation: This book provides useful insights and examples for those who intend to use needs analysis for research purposes and/or for curriculum development and evaluation. In his introduction, Long provides a rationale for conducting needs analysis to inform effective course design and to hold programs accountable. While needs analysis may vary by context, the book outlines a comprehensive research approach, principally through discussion of methodological considerations (especially chapter 1) that can be generalized to most contexts. Collected case studies (chapter 3-11) reflect needs analyses applied in a variety of societal, occupational, vocational, and academic domains, and they emphasize the link between setting, purpose, and methodology. For foreign language educators, chapter 3 ("Foreign language needs assessment in the US military" by Lett) and chapter 7 ("A task-based needs analysis of a tertiary Korean as a foreign language program" by Chaudron et al.) may be of key interest. Other cases cover societal-level language needs identification for policy shaping (chapter 2), English-language needs of hotel maids in Hawaii (chapter 4) and journalists in Spain (chapter 6), foreign-language (German) needs in business firms (chapter 5), Dutch-language needs of foreign professional footballers in the Netherlands (chapter 7), target task identification of naturalization interviews in the U.S. (chapter 9), dialogue analysis of coffee service encounters in Hawaii (chapter 10), and analysis of small- talk features in the workplace in New Zealand (chapter 11).
Loughrin-Sacco, S. J., Matthews, S. A., Sweet, W. M., & Miner, J. A. (1990). Reviving language skills: A description and evaluation of Michigan Tech's summer intensive French course. ADFL Bulletin, 21(2), 34-40.
keywords: US; French; immersion; intensive course; false beginners; curriculum; questionnaire; placement; university
annotation: The authors report on a curriculum and its evaluation, in the context of a summer intensive French program at Michigan Technological University. A previous ethnographic study had revealed that 56% of the students who had experience studying French for at least a year (false beginners) enrolled in elementary French courses to pull up their grade point average. In order to revive false beginners' language skills and entice them to enroll in higher level classes, Michigan Technological University set up two-week intensive immersion courses in French, German, and Spanish. The authors first describe the French program in detail (schedule, student-to-teacher ratio, content, course materials), report on the results of the students' course evaluations, and give suggestions on how the program can be applied in other contexts. For the evaluation, an ETS advanced-placement test was administered pre- and post-immersion to view the success of the course; an end-of-the-course questionnaire was also administered to the students to collect their impressions of the course and their motivation towards foreign language study. Based on the survey, the program changed the schedule for some of the activities that received low ratings. Most of the students from the French and Spanish intensive course continued their foreign language study at the intermediate or advanced level, indicating partial fulfillment of the initial purpose of the program.
Lynch, B. K. (1990). A context-adaptive model for program evaluation. TESOL Quarterly, 24(1), 23-42.
keywords: context-adaptive model; framework; method; reporting; audience; purpose; university
annotation: Lynch proposes a context-adaptive model for program evaluation following seven steps. First, the reasons for program evaluation will be different for each audience; thus, to ensure that the audience gets the most out of the evaluative outcomes and can make use of information later, the audience and their goals have to be identified. Second, a context inventory should be conducted to make decisions on what issues to prioritize and how to carry out the program evaluation (covering factors such as availability of comparison groups, reliable/valid language measures, evaluation expertise, and instructional materials and resources; background of students and staff; student selection process, size, intensity, perspectives and purpose of the program; timing of evaluation; and the social and political climate of the program). Third, developing a preliminary thematic framework will also clarify the conceptual framework of the program, what the salient issues are, and what aspect is going to be evaluated. Fourth, a data collection design and system has to be selected based on the questions that need to/can be answered; feasibility information from the context inventory also may limit the methodology (a useful decision making chart for data collection design/system is provided as an example). Fifth, collect data based on the clear purpose of the design; data collection may be eclectic since new questions and issues can emerge and the purpose of the evaluation can evolve with the program. Sixth, ideally, analyze data with a "multiple analysis strategy [which] can strengthen the evaluation by avoiding the possibility of bias associated with any particular technique" (p. 36). Finally, formulate and tailor the evaluation report for a particular audience.
Lynch, B. K. (1992). Evaluating a program inside and out. In J. C. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 61-99). Cambridge: Cambridge University Press.
keywords: ESP; Mexico; university; quantitative; qualitative
annotation: Lynch describes both quantitative and qualitative approaches that were utilized for formative and summative evaluation of the University of Guadalajara (UdeG)/University of California, Los Angeles (UCLA) Reading English for Science and Technology (REST) Project, which sought to improve reading skills for chemical engineering students.
Quantitative data consisted of a norm-referenced test (The English as a Second Language Placement Exam (ESLPE), fill-in-the-blank cloze) and criterion-referenced test (multiple-choice cloze). However, the ESLPE was the only pre- and post-test completed by both the treatment and control groups. As for the qualitative data, teacher/researcher journals, administrative logs, observation and program documents, interviews, questionnaires, and meeting notes were utilized. The data were coded and reduced into an Effects Matrix and Site Dynamics Matrix to "characterize the various outcomes and changes associated with the program" (p. 78) and to solicit and discuss the dilemmas and problems perceived. One problem the researcher faced was that the qualitative data collection was intensively done for the REST (experimental) program, but not for the control group, who did not receive any English instruction.
The researcher launched the project with evaluation in mind, which is not always the case in language curriculum development projects; his approach thus demonstrates how to incorporate evaluation from the outset. The REST project also makes obvious that collecting both quantitative and qualitative data enables a richer interpretation of any program. Lynch also calls attention to the need for linking intended analyses with the kinds of data collected, though the additional link between data analyses and their use for evaluative purposes is less clearly articulated.
Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge: Cambridge University Press.
keywords: EFL; university; method; measures; research design; positivistic; naturalistic; quantitative; qualitative; context adaptive model; REST project; Mexico
annotation: Lynch provides an overview of the conflict between quantitative (positivistic) and qualitative (naturalistic) research paradigms in program evaluation, and he finds a middle ground by introducing the context-adaptive model (CAM) for evaluation, "a flexible, adaptable heuristic - a starting point for inquiry into language education programs that will constantly reshape and redefine itself, depending on the context of the program and the evaluation" (p. 3).
Lynch begins by highlighting the potential contribution of program evaluation research to the field of second language acquisition (SLA). Language program evaluation is "the systematic attempt to gather information in order to make judgments or decisions" (p. 2), for justifying and/or improving a program that targets the development and use of learners' language abilities (chapter 1). Chapter 2 discusses the debate between positivistic and naturalistic paradigms, and describes the history of how the two resulting methodologies have been utilized in program evaluation. Lynch encourages careful articulation of what counts as evidence in the two different paradigms, and presents the associated issues of internal and external validity in chapter 3. In the following chapters, quantitative and qualitative research designs (chapter 4 and 5), data gathering, and analysis procedures (chapter 6 and 7) are presented through examples of how past program evaluations utilized those methods. Lynch argues that combining the two approaches to program evaluation research will allow an evaluator to acquire rich and thorough information about the program. In the case of Reading English for Science and Technology (REST) Project in Mexico, where a mixed-methods approach was used for evaluating an English for specific/academic purposes program, qualitative data helped to explain apparent contradictions in quantitative data (chapter 8). In chapter 9, Lynch provides a useful checklist of the seven steps that CAM follows: (a) determine the audience, goals, and purpose of evaluation; (b) gather information for a context inventory; (c ) identify preliminary thematic framework (emergent themes and issues); (d) select appropriate data collection designs/methodology to answer the questions; (e) collect data following the design chosen; (f) analyze and interpret data (conduct multiple perspectives negotiation for increasing relevance and understanding); and (g) tailor the reports for intended audiences.
Lynch, B. K. (2000). Evaluating a project-oriented CALL innovation. Computer Assisted Language Learning, 13(4-5), 417-440.
keywords: Australia; CALL; university; foreign language; observation; questionnaire; documents; logs; qualitative; validity; context-adaptive model
annotation: Lynch describes an evaluation of the Project-Oriented Computer Assisted Language Learning (PrOCALL) innovation in foreign language courses (Chinese, French, German, Indonesian, Japanese) at the University of Melbourne, where he acted both as an evaluator and as an administrator. Following the context adaptive model, the goals, audiences, preliminary thematic framework, and data collection design/system were determined. However, the evaluation evolved with the project implementation. The thematic framework and research questions were revised based on changing perspectives of the participating teachers and the project director. The data were collected from class documents (emails, web pages, other texts), documentation of project (brochures, ethics and grant proposals), teacher and director' s logs, teacher interviews, student focus group interviews, semi-structured classroom observations, quality of teaching surveys, and open-ended student feedback questionnaires. In addition to data analysis done by the researcher, Lynch enhanced the validity of interpretations not only by triangulating data from multiple resources, but by consulting the project participants for interpretation and going back to the data for "counter examples and rival explanations" (p. 437). Some recommendations for teaching contexts similar to the PrOCALL approach are offered.
Lynch, B. K. (2003). Language assessment and programme evaluation. Edinburgh: Edinburgh University Press.
keywords: assessment; evaluation; paradigm; purposes; design; measures; analysis; interpretivist; qualitative; positivist; quantitative; validity; ethics
annotation: Lynch highlights the relationships and differences between assessment and evaluation, and he presents "the range of paradigms (positivist vs. interpretivist), perspectives, designs, purposes, methods, analyses, and approaches to validity and ethics that currently define language assessment and programme evaluation" (p. vii). He defines evaluation as "the systematic inquiry into instructional sequences for the purpose of making decisions or providing opportunity for reflection and action" and assessment as "the range of procedures used to investigate aspects of individual language learning and ability, including the measurement of proficiency, diagnosis of needs, determination of achievement in relation to syllabus objectives and analysis of ability to perform specific tasks" (p. 1). He also posits evaluation as a super-ordinate category. Lynch claims that the purposes of assessment (assessment for decision making and/or learning) and evaluation (summative and formative purposes) interact with different methodological approaches. These he simplifies into two paradigm clusters, positivist (seeks to determine objective causal relationships) and interpretivist (seeks to understand socially constructed complex and fluid relationships). The chapters in the book flesh out how these paradigms influence assessment and evaluation designs (chapter 2), instruments (chapters 3 and 5), analyses (chapters 4 and 6), and validity and ethics (chapter 7).
Mackay, R. (1988). Position paper: Program evaluation and quality control. TESL Canada Journal, 5(2), 33-42.
keywords: definition; principles; stakeholders; evaluator role; utilization; pragmatic
annotation: The article offers useful and practical guiding principles for deciding the who, the what, and the why of program evaluation. Mackay defines program evaluation as the “purposeful and systematic collection, analysis, and interpretation of information about one or more components of a particular program” (p. 34) to resolve practical programmatic concerns. Throughout, Mackay takes a utilitarian and pragmatic approach to program evaluation centered around what he refers to as “principal stakeholders” (p. 35), or those responsible for the program. Following Mackay’s approach, the evaluator’s role is to serve the needs of the principal stakeholders or other stakeholders who are affected by evaluation when no principle stakeholders are clear. Serving the principle stakeholders includes facilitating discussion of the program components to be evaluated, the evaluation purposes and uses, and the appropriate timing of evaluation, among other important decisions necessary for program evaluation. To support principle stakeholders, Mackay advises the evaluator to provide information (evidence) that is responsive, timely, relevant, credible, and comprehensible. This article is a good starting point for beginning evaluators, particularly when considering the roles and responsibilities entailed in conducting useful and effective program evaluations.
Mackay, R. (1994). Undertaking ESL/EFL programme review for accountability and improvement. ELT Journal, 48(2), 142-149.
keywords: Indonesia; intrinsically motivated; extrinsically motivated; improvement; accountability; ODA; program/project based review model; framework
annotation: Mackay distinguishes extrinsically-motivated evaluation, which is a top-down bureaucratic approach to evaluation, mainly for accountability purposes, from intrinsically-motivated evaluation, which is undertaken by program personnel with a focus on the improvement of the program. Based on his experience on the evaluation of projects in fourteen language centers in Indonesia, Mackay proposes a program/project-based review model. This model is based on an intrinsically-motivated evaluation system but seeks to satisfy both (internal) improvement and (external) accountability demands. The model involves the following steps: (a) conceptualization of the program/project as a whole; (b) review of the program/project components and possible focus (e.g., staff, resources, curriculum etc.) within each component; (c ) identification of key areas of each focus (e.g., quality of teaching, each course, etc.); (d) setting performance indicators of the key areas to estimate effectiveness; (e) appropriate and credible data collection on each performance indicator; and (f) examination and interpretation of data among program/project personnel to arrive at judgments on the strengths and weaknesses for each key area. The gathered information can serve interests and concerns of both the program/project staff and the bureaucracy.
Mackay, R., & Bosquet, M. (1981). LSP curriculum development: From policy to practice. In R. Mackay & J. D. Palmer (Eds.), Language for specific purposes: Program design and evaluation (pp. 1-28). Rowley, MA: Newbury House.
keywords: LSP; curriculum development; program maintenance; needs analysis; questionnaire
annotation: Mackay and Bosquet offer suggestions on how the process of educational decision making and curriculum development can be achieved, presenting examples of constraints from language for specific purpose programs. There are three stages in operating curriculum development. (1) At the pre-program development stage, the educational goal and its rationale will be decided by the administrative body in authority, and the intention to develop the program will be diffused to all stakeholders. The main purpose of this dissemination is to ensure and maximize the chances of obtaining valuable information. (2) At the program development stage, various factors that affect the program are identified, weighed, and contemplated. The processes of the developmental stage are similar to that of the task-based language teaching framework. The five phases that are involved in this stage are: (a) Basic information gathering phase, where learners' needs and target situations are identified; (b) Goal-specification phase, where gathered information is transformed to specify objectives; (c ) Production phase, where materials and tests will be created based on appropriate target language samples identified from the needs analysis, syllabus is specified, and appropriate methodological procedures are devised and implemented; (d) Teacher-training phase, where teachers are trained in the new innovations, and where students' and teachers' perceptions on the effectiveness of the program are utilized to adjust pedagogical instruments; and (e) Trial phase, where formative and summative evaluation take place. (3) At the program maintenance and quality control stage, the quality of instruction, appropriateness of the goals, teacher training, and testing procedures are monitored. The three stages are stated in a hierarchical fashion, although they may intertwine with each other throughout the development process. Appended are useful examples of a student needs survey to elicit problems in listening skills, and a teacher feedback questionnaire for lesson evaluation.
Mackay, R., Wellesley, S., & Bazergan, E. (1995). Participatory evaluation. ELT Journal, 49(4), 308-317.
keywords: Indonesia; ODA; participatory; performance indicator
annotation: Mackay, promoting a participatory approach to project evaluation, acted as an external consultant for the English Language Teaching Projects Unit (ELTPU) in twenty-five language centers in Indonesia. A performance indicator (PI) framework was introduced at the ELTPU workshop as a diagnostic approach to program evaluation. The PI framework starts with diagnosing the focus of the project, by dividing it into key areas and defining performance indicators for each. The indicators are then broken down into two to three critical ‘themes' to indicate what information is needed from evaluation. The relative strength and weaknesses of each PI found from the analysis will provide future direction for the language center, directed at long-term sustainability. A key factor in the PI framework is the participation of stakeholders. The stakeholders are involved in all decision-making processes, making evaluative outcomes contextually appropriate. The advantage of the PI framework is that it provides opportunities for the stakeholders to collaboratively clarify and reflect on the issues, which can lead to self-awareness-raising and empowerment for the staff.
Mackay, R., Wellesley, S., Tasman, D., & Bazergan, E. (1998). Using
institutional self-evaluation to promote the quality of language and communication
training programmes. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing
evaluation and innovation in language teaching: Building bridges (pp.
111-131). London: Longman.
keywords: EFL; participatory model; self-evaluation; Programme Based Review; performance indicator
annotation: Mackay, Wellesley, Tamasan, and Bazergan illustrate how a language center in Indonesia benefited by undergoing the process of participatory self-evaluation called Programme-Based Review (PBR). This approach emphasizes "participative monitoring and evaluation activities initiated within the unit to facilitate periodic or continuous improvement by programme staff themselves" (pp. 111-112). Based on its intrinsic motivation, PBR can generate direct and relevant information for the continuous improvement of program management and teaching, as well as information for satisfying accountability demands of supervisory bodies. The contextually-bound performance indicators (Pis) clarify how the program is expected to work based on the language center goals and policies, the Pis are used as a framework for making evaluative judgments. PBR also provides opportunities for stakeholders to take an informed role in planning and implementing evaluation, making suggestions based on evaluative outcomes, and acting for improvement.
Mathews, T., J., & Hansen, C., M. (2004). Ongoing assessment of a university
foreign language program. Foreign Language Annals, 37(4), 630-640.
keywords: KW: US; French; German; Spanish; university; portfolio; NCATE; ACTFL; assessment; top-down; accreditation
annotation: Mathews and Hansen report on the process and results from the first-year of assessment of foreign language programs (French, German, and Spanish) at Weber State University. All students in lower level courses (each semester) and all potential teaching majors and minors take modified or unmodified ACTFL Oral Proficiency Interviews (OPIs). The ACTFL OPI was selected as the primary assessment tool due to the implementation of assessment standards in 2004, developed by the National Council for Accreditation of Teacher Education (NCATE) and ACTFL. Beyond this immediate demand, the foreign language department in this study took initiative in 1998 to develop a departmental mission statement, and they set student learning outcomes in reference to National Standards. In 2000, they proposed a senior assessment portfolio (a computerized oral proficiency test and writing samples) and created rating rubrics. For this study, all graduating seniors' portfolios were assessed for their achievement of stated outcomes. The purposes of the evaluation study were: (a) to examine to what extent the department's curriculum and requirements are helping the students to maximize proficiency; (b) "to check department's progress in incorporating the National Standards into the curriculum" (p. 630); and (c ) to assess students' achievement of the departments' outcomes and judge whether the goals are reasonable.
Matthies, B., F. (1991). Administrative evaluation in ESL programs: “How’m
I doin’?” In M. C. Pennington (Ed.), Building better English
language programs (pp. 241- 256). Washington, DC: NAFSA.
keywords: ESL; program administration; directors; professional development
annotation: Similar to Fox (see chapter 11), Matthies illustrates the importance of directors' on-going improvement throughout their career by building on areas highlighted via evaluation. Program administration can be understood in comparison with three distinct types of criteria: (a) professional guidelines and other ESL institutions (national standards); (b) parent institution's mission (the institution); and (c ) the ESL program itself. Based on survey research, the author identifies the following six key job skills for administrators: communicating, planning, educating, organizing, evaluating, and negotiating. Support staff, instructional staff, students, and parent institutions that are directly related to the ESL director, should be able to evaluate administrative effectiveness. Matthies provides several evaluation measures as examples: self-evaluation of current job status, formal/informal feedback, written feedback, checklist response (Appendix A), and student survey (Appendix B).
McAlpine, D., & Dhonau, S. (2007). Creating a culture for the preparation of an ACTFL/NCATE program review. Foreign Language Annals, 40(2), 247-259.
keywords: NCATE; accreditation; assessment; OPI; electronic portfolio; university
annotation: McAlpine and Dhonau reflect on and illustrate the acculturation process for NCATE/ACTFL review at the Department of International and Second Language Studies at the University of Arkansas at Little Rock. Since its inception in 2002, the ACTFL/NCATE Program Standards for the Preparation of Foreign Language Teachers has heightened teacher education programs’ attention to quality assurance of teacher candidates’ knowledge, skills, and dispositions of language, culture, and literature, knowledge of assessment, and professionalism. McAlpine and Dhonau provide six suggestions from their experience undertaking NCATE/ACTFL accreditation: (1) foster collaboration and engagement from both the foreign language department(s) and the College of Education; (2) establish capacity, infrastructure, and culture of oral proficiency testing (ACTFL oral proficiency test); (3) familiarize faculty with the Standards for Foreign Language Learning; (4) revise and align curriculum around the ACTFL/NCATE Content Standards; (5) implement an assessment system to gather evidence on the content standards; and (6) create data/artifacts management system utilizing technology, such as electronic portfolios. The article also provides a useful timeline and activity list for preparing for NCATE/ACTFL program review as an Appendix.
Middlebrook, G. C. (1991). Evaluation of student services in ESL programs.
In M. C. Pennington (Ed.), Building better English language programs:
Perspectives on evaluation in ESL (pp. 135-154). Washington, DC: NAFSA.
keywords: student service; administration; ESL
annotation: Middlebrook addresses an administrative aspect in evaluating language programs, namely, the need to evaluation student services components. He provides lists of useful evaluative questions to be asked for each component (recruitment, admissions, orientation, employment, advising, financial aid, and housing) of student services in an ESL program. The guidelines he proposes feature the need for: (a) "thoughtful, pragmatic and well-articulated institutional policy" (p. 148) that will allow the program to specify the goals and objectives; (b) staff to possess "requisite skills and knowledge" (p. 148); and (c ) adequate funding for student services components to function as stated in program policy.
Milleret, M. (2008). The trials and tribulations of comprehensive program evaluation. ADFL Bulletin, 39(2&3), 44-48.
keywords: US; university; Portuguese; needs analysis; program improvement; program development; qualitative; quantitative; survey; focus group; IRB; case study
annotation: Milleret recounts a program evaluation which was part of a program development project in a Portuguese program at a US university. The program development project was initiated by members of the department in an attempt to address three issues: poor course articulation, the need for program growth, and concern about the needs of Spanish-speaking students in the program. The needs analysis evaluation used both qualitative and quantitative data and the main data collection instruments were surveys, focus groups, and program documents. The evaluation team encountered numerous challenges during the evaluation including very limited funds, poor responses to focus group requests, and delays due to the IRB process. Despite the challenges, the evaluation produced data to guide program development including the design of new courses to meet student interests and the development of a special course sequence for Spanish-speaking students. The Portuguese program continues to utilize data from the evaluation as it aims for continued growth and improvement.
Mitchell, R. (1989). Second language learning: Investigating the classroom
context. System, 17(2), 195-210.
keywords: Scotland; French; Gaelic; bilingual education; secondary; elementary; foreign language; communicative approach; classroom observation; interview; assessment; action research; retrospective; political
annotation: Mitchell first reviews foreign language classroom research (1976-1986) and then introduces an evaluation study of Gaelic-English bilingual programs (1984-1986), undertaken at the University of Stirling, Scotland. The purpose of each project varied. First, the classroom research studies were mainly concerned with the actual instructional practices (teachers' use of the target language, skills used in instruction that influence learners' FL experience, the use of message-oriented activity types, routinization of communicative methodology) and teachers' views on methodology, during the shift towards communicative language teaching in Scotland. Some of the instruments created for the classroom research included systematic observation instruments, teacher questionnaires and interviews, assessments of students' achievement and attitudes.
Second, suffering from political tensions between the Local Education Authority and the Scottish Education Department over continuation of the Bilingual Education Project (BEP), an evaluation study of the Gaelic-English bilingual program was undertaken. This independent retrospective evaluation of bilingual programs utilized teacher and parent interviews, qualitative classroom observation, and assessment of students' Gaelic and English proficiency to document the implementation of the BEP and to examine learners' achievements and factors that influence their achievements. This language program evaluation study is one of the early examples of a non-experimental approach to program evaluation. Mitchell notes that "Evaluators have a duty to address all process and product variables which are important for participants in the programme, and not only those few which can be experimentally linked; the "clear picture" may involve ethnographic portrayals as well as quantitative accounts: and evaluation reports must be informative not only for decision-makers in the specific context of the programme under study, but also for others considering the future development of similar programmes" (p. 207).
Mitchell, R. (1990). Evaluation of second language teaching projects and
programmes. Language, Culture, and Curriculum, 3(1), 3-15.
keywords: Scotland; bilingual education; Gaelic; English; qualitative
annotation: Mitchell argues for language program evaluation studies to move away from experimental and quasi-experimental research, even if internal validity is supported by the systematic gathering of information about the process of language acquisition (as suggested by Long, 1984). She suggests this shift due to the complex nature of what is actually involved in a program. Thus, studies that are not bound by experimental expectations can address a wider range of questions. As an example of a non-experimental study, she illustrates an evaluation with a many-faceted design which took place at the Gaelic-English bilingual primary school program in Scotland. Evaluative techniques included unstructured classroom observations, teacher and parent interviews, and writing and speaking tests. Mitchell emphasizes that evaluation is not about revealing the causal relationships between a small number of variables, but rather inferring the most likely relationships of complex events that interact by monitoring intended outcomes, identifying the unexpected, and proposing untried solutions (p. 11)
Mitchell, R. (1992). The "independent" evaluation of bilingual
primary education: A narrative account. In J. C. Alderson & A. Beretta
(Eds.), Evaluating Second Language Education (pp. 100-140). Cambridge:
Cambridge University Press.
keywords: Scotland; Gaelic; English; elementary; bilingual education; retrospective; political; outsider
annotation: Mitchell presents an independent, retrospective evaluation of a Gaelic-English bilingual primary school program in Scotland, where Mitchell and her colleague, being ‘outsiders' , had to negotiate a rough political climate during the planning and implementation of an evaluation project. The instruments were restricted by local administrative bodies. Evaluators were not to question students' attitudes towards culture. Further, the Scotland Education Department required use of a quantitative-experimental methodology. As a result of these restrictions, the evaluators' array of options was curtailed. In order to select ten site schools for study, a preliminary interview survey with the head teachers of all schools was conducted, querying the history of their school involvement with the bilingual education project (BEP), attitude towards BEP policies, and the present status of program implementation. Two classrooms within each school were selected, observed, and participants interviewed. Students' Gaelic and English proficiency, perceptions, and attitudes towards the BEP (but not culture) were measured. In addition, a limited number of parents were interviewed for their knowledge, involvement, and attitudes towards BEP. This case provides a good example of the potential confrontation between evaluators and educational/political authorities, especially when the evaluation proceeds with high stakes on the line.
Morris, M. (2006). Addressing the challenges of program evaluation: One department’s experience after two years. Modern Language Journal, 90(4), 585-588.
keywords: assessment; multiple languages; electronic portfolio; modified SOPI; interview; university
annotation: Morris reflects on an evaluation of how well the department of Foreign Languages and Literatures department at Northern Illinois University prepares its majors. The main data collection tools were: (a) a best-works electronic portfolio that consisted of a number of different components including artifacts that demonstrated learners’ ability in cultural understanding, reading, speaking, and writing; (b) a written-exit questionnaire; © a follow-up alumni questionnaire; and (d) a modified SOPI test. In choosing appropriate instruments, the faculty at Northern Illinois University carefully modified existing instruments to reflect the departmental student learning outcomes. Evaluation results informed curriculum improvement, including the creation of a new course and changes in class scheduling, and also raised questions about the efficacy of the evaluation methods for cultural learning and oral proficiency. Morris’s narrative illustrates the organic, responsive, and reactive nature of program evaluation (or “messiness” in Morris’s words). Engaging in program evaluation also led to increased faculty awareness and communication about the program.
National Research Council of the National Academies. (2007). International education and foreign languages: Keys to securing America’s future. Washington, DC: The National Academies Press.
keywords: Title VI; Fulbright-Hays; national security; global economy; large-scale evaluation; language centers; federal programs
annotation: This is a national-level evaluation report of the Title VI and Fulbright-Hays (Title VI/FH) programs, federal programs that support higher education with the aim of building foreign “language abilities or knowledge of world regions and international issues” (p. 60). The National Academies formed the Committee to Review the Title VI/FH International Education Programs.
Part One of the book reviews the need for foreign language, area, and international expertise in the U.S., the historical context and implementation of Title VI/FH programs, and the unique roles Title VI/FH has in relation to other federal programs, such as Title VI/FH’s broad role compared to other federal programs aiming at capacity building for national security.
Part Two addresses the effectiveness and adequacy of eight key performance areas, referred to as program missions, defined by the U.S. Congress. Examples of the eight areas include “reducing shortages of foreign language and area experts” (p. 2) and “supporting research, education, and training in foreign languages and international studies” (p. 2). The Committee first created a logic model specifying “resources/inputs, activities, outputs, [short-term and long-term] outcomes, and impact expected of the program” (p. 313-314). The assessment of the expected outcomes was restricted due to insufficient data and few systematic program evaluation studies. For this reason, the Committee extracted data from only a few program evaluation studies, program monitoring data, grant applications, funding data, commissioned papers and targeted analyses, written commentaries from experts, public testimonials, and site visits. Although the evidence was too limited to draw conclusions and recommendations, the Committee indicated that Title VI/FH programs have played an important role as foundation-builder for “internationalizing higher education” (p. 242). Furthermore, Title VI/FH programs were shown to have focused universities’ attention on building pedagogical resources and teaching foreign languages and area studies with particular focus on the less commonly taught languages.
The final section, Part Three, offers the Committee’s recommendations for future Title VI/FH programming strategies. The appendices provide detailed information on legislative history of Title VI/FH, logic models, summaries of Title VI/FH evaluation studies, and site visit interview questions.
Norris, J. M. (2006). The why (and how) of assessing student learning outcomes in college foreign language programs. Modern Language Journal, 90(4), 576-583.
keywords: student learning outcomes; assessment; evaluation; measurement; definition; university
annotation: Norris calls for college foreign language (FL) educators to reconceptualize student learning outcomes (SLO) as a means of assuring not only educational quality and effectiveness but also improvement, development, and even defense/survival of existing programs. Norris’ reconceptualization also stresses evaluation as an opportunity for programs to define and articulate their values. In order to conduct SLO assessment that is informative and action-oriented, Norris argues (based on learning from evaluation work to date) that it must be participatory, feasible, useful, credible, relevant, timely, understandable, and clear to the intended users of assessment. To achieve such useful SLO assessment practices, Norris suggests three key steps. First, FL departments must resolve terminological confusion by distinguishing among measurement, assessment, and evaluation. Norris emphasizes that SLO assessment should be free from technocracy of measurement issues and should have a system that bridges learner information and program use, a use that “[helps] educators deliver better programs and [helps] students achieve valued learning outcomes” (p. 582). Second, FL professionals should respond to the accountability movement as an opportunity to build assessment capacity and rethink and redirect programs. Thirdly, educators need to foster programmatic and evaluative thinking for program development as well as understanding the role of assessment in their programs. By way of example, Norris introduces three examples of FL program evaluation at the university level that focus on SLOs: Byrnes (2002), Dassier & Powell (2001), and Liskin-Gasparro (1995).
Norris, J. M. (2008). Validity evaluation in language assessment. New York: Peter Lang.
keywords: Placement test; program evaluation; validity; improvement; curriculum; C-test; German; university
annotation: Norris illustrates the challenges that face college foreign educators in evaluating and ensuring the quality of educational assessments. Instead of applying conventional validity criteria from psychometric traditions to evaluate educational assessments, Norris advocates the reconceptualization of assessment validation as “validity evaluation.” Validity evaluation requires: (a) the treatment of an educational assessment as a coherent program; (b) the acceptance of a variety of purposes for assessment validity evaluation; (c) the prioritization and contextualization of evaluation purposes; and (d) the selection and articulation of evaluation methods to meet prioritized purposes. To exemplify validity evaluation, Norris applied the framework to the development and implementation of a placement assessment program in the Georgetown University German Department. Facilitated by Norris, the department specified the intended uses of assessments in their program, developed an assessment program aligned with the curriculum, implemented and revised the placement assessment program, and sustained the assessment program by taking action based on evaluation findings. The comprehensive validity evaluation of the assessment program not only had positive consequences for the innovative curriculum but also helped the faculty members to reconceptualize and transform assessment and educational practices in their program.
Norris, J. M. (2009). Understanding and improving language education through program evaluation:
Introduction to the special issue. Language Teaching Research, 13(1), 7-13.
keywords: improvement; accountability; context; participation; multiple methods
annotation: In his introduction to a special issue on evaluation, Norris emphasizes the need to focus on the use of evaluation for improving language programs and teaching practices. He argues that the increased demand for evaluation resulting from greater emphasis on accountability provides an opportunity for evaluators to increase awareness of evaluation for developmental purposes. Norris provides an overview of the five articles in the journal; each one an example of how evaluators had to respond to various constraints and pressures, yet managed to conduct evaluations that were useful to stakeholders, sensitive to political contexts, and provided valuable information for program improvement purposes. Despite their diverse contexts, the examples had the following characteristics in common: (a) the participation of language teachers in the evaluation; (b) the use of multiple methods of data collection, which provided for a better understanding of the various factors and perspectives involved; (c) the contextualization of findings to prevent their misuse or misinterpretation; and (d) communication with stakeholders to increase the likelihood that findings are understood and used. In proposing next steps for the field, Norris advocates for evaluators to take a more proactive, instructional role, helping stakeholders, participants, and audiences see evaluation as a means of improving language programs and teaching practices, rather than only as an accountability tool.
Palmer, A. (1992). Issues in evaluating input-based language teaching programs.
In J. C. Alderson & A. Beretta (Eds.), Evaluating Second Language
Education (pp. 144-166). Cambridge: Cambridge University Press.
keywords: German; university; experimental curriculum; Krashen; comparative method; testing; attitude; journal; questionnaire
annotation: Palmer describes various decision-making issues he and his colleagues faced during the evaluation of an eight-month experimental first-year German course at the University of Utah. The study examined whether applying Krashen' s input and affective filter hypotheses to language teaching was feasible, productive, and appealing. Attitudinal information about the program was obtained through journals and activity ratings by the students, on-going conversations with teachers, papers written by the teachers, and a questionnaire to the students, teachers, and administrators. Traditional language tests, and students' self-ratings of their performance, measured learners' listening, speaking, reading, writing, grammar, and vocabulary. Teachers perceived the input-driven approach feasible, while students, over time, felt the need for output practice. The test results were analyzed statistically, revealing that the control group performed better than the experimental group. The evaluators faced many dilemmas between interpretability and practicality, and they had to make compromises in deciding how to test; thus, "due to constraints on time, money and personnel, tests had to be easy to develop, administer, and score" (p. 152). Test design issues potentially led to apparent discrepancies between the instructed learners. Such methodological issues led the author to embark on a series of further evaluation studies.
Pawan, F., & Thomalla, T. G. (2006). Making the invisible visible: A
responsive evaluation study of ESL and Spanish language services for immigrants
in a small rural county in Indiana. TESOL Quarterly, 39(4), 683-705.
keywords: ESL; U.S.; Spanish; community service; immigrant; responsive evaluation; participatory; SWOT analysis; purposive sampling; stakeholder
annotation: Pawan and Thomalla report the implementation and results of a responsive evaluation of ESL and Spanish language services, initiated by the County Alliance for Community Education (CACE) in a rural county in Indiana. The impetus for the evaluation study was a prediction that the number of immigrants and immigrant workers is likely to grow, due to the influx of immigrants in neighboring counties and a dairy company' s plan to locate its facility in the county. The purposive sample of participants included sponsors (the board members and staffs of CACE), community leaders, service providers, and language service clients. Initial interviews were conducted with 12 stakeholders to elicit concerns and issues, and to set up standards for evaluation. The standards, then, served as a guide to elicit information from a larger population (63 individuals). Multiple sources of information about the language service providers, such as interview notes (notes were checked by the interviewee for accuracy), activity observations, and documents and multimedia analysis, were utilized for triangulation.Two meetings with ten representative stakeholders were held to jointly discuss the report in terms of strengths, weaknesses, opportunities, and threats (SWOT). The SWOT analysis enabled stakeholders to engage in the interpretation of findings. Lastly, the authors laid out short-term and long-term recommendations to stakeholders. The responsive evaluation approach involved collaborative decision making between the evaluation specialist and the stakeholders, and it offered insights from multiple perspectives into the existing situation and the complexities of providing language services.
Peacock, M. (2009). The evaluation of foreign-language-teacher education programmes. Language Teaching Research, 13(3), 259-278.
keywords: Hong Kong; teacher education; program improvement; university; EFL; qualitative; quantitative; questionnaire; interview; case study; evaluation procedure
annotation: This article describes an evaluation of a foreign-language teacher education program at the City University of Hong Kong. The evaluation aimed to determine the program’s strengths and weaknesses, and how well it met the needs of the students. Based on review of the evaluation and foreign language teaching literatures, the evaluator developed his own evaluation procedure. The steps for the procedure were: (a) review the literature and produce a set of questions; (b) establish appropriate sources of data for the setting; (c) choose and design data collection methods and instruments; (d) collect and analyze each set of data against the questions; (e) construct an account by relating each interpretation to the others (p. 262). The evaluation used both qualitative and quantitative data collection methods, including questionnaires, interviews, and document analysis. After discussing the results of the evaluation and recommendations for program improvements, the author reflects on the strengths and weaknesses of the procedure, and makes suggestions for improving the evaluation in the future.
Pennington, M. C. (1991). Building better English language programs:
Perspectives on evaluation in ESL. Washington, DC: NAFSA.
keywords: administrator; faculty evaluation; class observation; student service; self-study
annotation: The book is a collection of articles that discuss approaches to English as a second language (ESL) program evaluation (system construction, self-study), evaluation of curriculum and content (participatory placement and cultural aspects of ESL programs), non-instructional aspects (student services, database construction), and administrative aspects (administrators and teachers). The book is particularly broad in its coverage of diverse elements that constitute language programs and enable them to function. The chapters include: "Developing effective evaluation systems for language programs" by Brown and Pennington (chapter 1); "Self-study and self-regulation for ESL programs: Issues arising from the associational approach" by Byrd and Constantinides (chapter 2); "A novel approach to ESL program evaluation" by Eskey, Lacy, and Kraft (chapter 3); "Unifying curriculum process and curriculum outcomes: The key to excellence in language education" by Pennington and Brown (chapter 4); "Participatory placement: A case study" by Spaventa and Williamson (chapter 5); "Evaluation of culture components in ESL programs" by Winskowski-Jackson (chapter 6); "Evaluation of student services in ESL programs" by Middlebrook (chapter 7); "Creating and operating a statistical database for evaluation in an English langauge program" by Ponder and Powell (chapter 8); "Designing and assessing the efficacy of ESL promotional materials" by Jenks (chapter 9); "Procedures and instruments for faculty evaluation ESL" by Pennington and Young (chapter 10); "Evaluating the ESL program director" by Fox (chapter 11); "Administrative evaluation in ESL programs: How'm I doin'?" by Matthies (chapter 12). Chapters include useful appendices, such as checklists for evaluating cultural components of a program, classroom observation sheets, sample C-tests for placement purpose, and faculty evaluation instruments.
Pennington, M. C., & Brown, J. D. (1991). Unifying curriculum process
and curriculum outcomes: The key to excellence in language education. In
M. C. Pennington (Ed.), Building better English language programs: Perspectives
on evaluation in ESL (pp. 57-74). Washington, DC: NAFSA.
keywords: model; Curriculum Process Model; Curriculum Outcomes Model; quality control; needs analysis; objectives; testing; materials; teaching; consistency; efficiency; effectiveness
annotation: Curriculum development is a cyclical process of interrelated activities, including needs analysis, objectives setting, testing, materials, teaching, and evaluation. In the Curriculum Process Model (Brown, 1989), evaluation is a "process devoted to continually improving each component of a program on the basis of what is known about all other components separately as well as collectively" (p. 65). In addition to the Curriculum Process Model, Pennington and Brown add another dimension, the Curriculum Outcomes Model, which takes a quality control approach to ensuring excellence in language programs. A language program will achieve targeted outcomes if consistency, efficiency, and effectiveness are unified at all programmatic stages (as covered in the Curriculum Process Model). By "developing a more unified vision of the curriculum and greater cooperation among members of a language program" (p. 71), the purpose of evaluation will also be clarified. Evaluative outcomes, in turn, contribute to a more unified understanding of the program.
Pennington, M. C., & Young, A. L. (1991). Procedures and instruments
for faculty evaluation in ESL. In M. C. Pennington (Ed.), Building better
English language programs: Perspectives on evaluation in ESL (pp. 191-227).
Washington, DC: NAFSA.
keywords: ESL; faculty development; faculty evaluation
annotation: Pennington and Young address the evaluation of faculty as a way to reinforce program quality. They reflect on the use of different kinds of assessment at different stages of the teachers' career. Pennington and Young claim there are two kinds of instruments that can be utilized, depending on the purpose: fluid instruments (conversations, letters, essay questionnaire) and fixed instruments (fixed response questionnaire, rating scales, tests, and various descriptive data). Both types have advantages and disadvantages to fully respond to the purpose of evaluation, multiple instruments from multiple resources should be used. They suggest four steps for a performance evaluation interview: (a) substantiate performance; (b) reach understanding of (teaching) job requirements and responsibilities; (c ) gain acknowledgement of the issues discussed in the interview; and (d) set goals, action steps, and a time-table for professional development towards the goals. Teachers are an essential part of any educational program, and faculty evaluation can offer substantial information for further development. Appendices include samples of an essay questionnaire, rating scales, a teacher observation form, a form for self-evaluation of a lesson, a student evaluation form, faculty standards of performance, categories for evaluation of research/teaching/service, and a format for annual teacher performance review.
Ponder, R., & Powell, B. (1991). Creating and operating a statistical
database for evaluation in an English language program. In M. C. Pennington
(Ed.), Building better English language programs: Perspectives on evaluation
in ESL (pp. 155-171). Washington, DC: NAFSA.
keywords: ESL; data collection; database
annotation: Evaluation involves systematic data collection over extended periods of time. The purpose and utility of the database motivate the types of variables included: to track individuals, to inform language learning theory and pedagogical decisions, and to make administrative and business decisions. The authors list four typical narrative cases of problems that frequently arise in language program evaluation (e.g., placement) and provide solutions by using an example statistical database. Ponder and Powell suggest that it will be wise to conceptualize what variables one will be collecting and decide what kind of record keeping (management) system, database format, and analysis will be used, before the implementation of evaluation.
Rea-Dickens, P., & Germaine, K. P. (1998). The price of everything and
value of nothing: Trends in language program evaluation. In P. Rea-Dickens
& K. P. Germaine (Eds.), Managing evaluation and innovation in language
teaching: Building bridges (pp. 3-19). London: Longman.
keywords: UK; Europe; overview; trends; methods; participatory evaluation
annotation: Rea-Dickens and Germaine illustrate the growth of interest in program evaluation in the 1990s, evident from the increase of publications on evaluation (including macro- and micro- evaluation studies), the emergence of an active professional evaluation community, and the establishment of various external accreditation organizations (especially in the UK). With the expansion of evaluation functions (accountability, developmental, awareness-raising, and management), encouragement to use a variety of triangulated data elicitation methods, and engagement of various stakeholders in the process of evaluation, program evaluation has become much more dynamic than early traditions that focused on pre-determined measurable outcomes from an empiricist paradigm. Evaluation is argued here to be information/knowledge generation for short-term immediate use and for policy shaping, thereby building bridges across domains and stakeholders to "promote professional development and validity" (p. 16).
Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes
of classroom assessment. Language Testing, 18(4), 429-462.
keywords: EAL; English; assessment; elementary; process; classroom observation; feedback; teachers
annotation: Rea-Dickins presents a working framework of processes (planning, implementation, monitoring, and recording/dissemination stages) and strategies in classroom assessment decisions. She then applies the framework to examine the classroom assessment practice in an elementary-level English as an Additional Language (EAL) classroom. The evaluator observed, video- and audio-recorded, took field notes, and transcribed three assessments of classroom interaction (formal assessment, informal whole class assessment, informal small group work assessment). The evaluation was repeated in three school settings for one week each over three school terms. The evaluator also conducted semi-structured teacher interviews with two language support teachers, and one mainstream class teacher, before and after administering assessments. In addition, two learners in each of four classes were tracked in detail to reflect on students' assessment experiences. Analyses revealed three purposes of the assessment: (a) bureaucratic (providing information for external agency), (b) pedagogic (making instructional decisions based on learners' achievements), and (c ) learning (developing learner awareness, understanding, and knowledge). Though in-depth evaluation of formative assessment practices may be difficult for teachers to practice on a daily basis, its cyclical use can raise awareness about the types and the roles of formative assessment.
Rea-Dickins, P., & Germaine, K. (1992). Evaluation. Oxford:
Oxford University Press.
keywords: teacher training; curriculum development; accountability; method; purpose; procedures; framework; participatory; principles
annotation: Rea-Dickins and Germaine view evaluation as "the means by which both teaching and learning may function more efficiently and quality be assured" (p. xii). The perspective of evaluation for accountability, for curriculum development and innovation, and for professional development is consistent throughout the book. Section one ("Explanation") provides principles of educational evaluation (innovation, management, and context), exploring the evaluation purpose, design, and framework. Section two is a collection of short summaries of 15 case studies with attention to the context, aim (purpose), design, and procedures. The examples range from evaluation of a project, an intensive ESL program, secondary schools, treatment of oral errors, materials, teachers, learner outcomes (process and product), to syllabus evaluation. The last section ("Exploring evaluation potential") is devoted to the application of the previously mentioned frameworks, methodology, and other aspects of evaluation through tasks. Rea-Dickins and Germaine emphasize the involvement of teachers and stakeholders throughout the evaluation process parallel to curriculum development. Rather than a theoretical argument, this book serves as a guide for teachers to clarify the principles and carry out evaluation in practice. The tasks (125 in total) provide opportunities for the practitioners to reflect and raise awareness to conduct evaluation in their own contexts. Although the book covers many aspects of evaluation process in a limited space, the managerial, political, and personal aspect of evaluation falls short (only mentioned briefly in four pages, section 1.4). One may want to look into Rea-Dickins and Germaine' s (1998) other edited book, titled "Managing evaluation and innovation in language teaching: Bridging bridges."
Rea-Dickins, P., & Germaine, K., P. (Eds.) (1998). Managing evaluation
and innovation in language teaching: Building bridges. London: Longman.
keywords: UK; Europe; ESL; EFL; innovation; management; implementation; teacher education; ethnography; culture
annotation: The book is a collection of 11 chapters related to innovation and change in English language programs around the world, but primarily in European contexts. After the introductory chapter, an overview of trends in language program evaluation, the chapters are divided into three sections: (1) Evaluating innovation in language education (3 articles), (2) Managing evaluation and innovation (3 articles), and (3) views from the bridge (4 articles). The first two sections are reviewed in this annotation since they address different approaches to program evaluation and provide real-world evaluation examples in a variety of settings. Both sections seek bridges from other disciplines, thereby expanding potential methodologies and approaches in language program evaluation. The annotated book chapters are: "The price of everything and the value of nothing: Trends in language programme evaluation" (Rea-Dickins and Germaine, chapter1); "Evaluating the implementation of educational innovations: Lessons from the past" (Karavas-Doukas, chapter 2); "Language and cultural issues in innovation: The European dimension" (Roberts, chapter 3); "Programme evaluation by teachers: Issues of policy and practice" (Kieley, chapter 4); "Using institutional self-evaluation to promote the quality of language and communication training programmes" (Mackay, Wellesley, Tasman, & Bazergan, chapter 5); "Managing developmental evaluation activities in teacher education: Empowering teachers in a new mode of learning" (Hedge, chapter 6); and "Managing and evaluating change: The case of teacher appraisal" (Anderson, chapter 7).
Ricardo-Osoria, J. (2008). A study of foreign language learning outcomes assessment in U.S. undergraduate education. Foreign Language Annals, 41(4), 590-610.
keywords: US; university; survey; quantitative; performance-based; outcomes assessment; ACTFL; oral proficiency interview; foreign language
annotation: This article reports on a survey of student learning outcomes assessment in university foreign language programs in the US. The study investigated which performance-based assessments are commonly used, how frequently the ACTFL guidelines and National Standards are used, and which obstacles impede the use of performance-based assessments. A Likert-style, web-based questionnaire was developed and quantitative data analysis was performed. The results indicated that faculty-designed multiple choice tests were the most common assessment method, followed by student papers and projects. Translation was more common than the oral proficiency interview or portfolios. ACTFL guidelines were often used for developing speaking assessments, but rarely for other purposes. Lack of time and lack of faculty knowledge were given as the main obstacles to using performance-based assessment. The article also provides a thorough literature review of the recent history and common types of performance-based assessment.
Richards, J. (2001). Approaches to evaluation. In J. Richards (Ed.), Curriculum
development in language teaching (pp. 286-309). Cambridge: Cambridge
University Press.
keywords: theoretical; formative; summative; illuminative; evaluation questions; method; development; accountability; stakeholder identification
annotation: Richards situates evaluation as one of the key elements at stake throughout the curriculum development process, functioning as a "reflective analysis of the practices" (p. 286). This chapter briefly covers the three types of evaluation purposes (formative evaluation for ongoing development and improvement; illuminative evaluation for deeper understanding of the program; and summative evaluation for seeking program effectiveness). It then moves to issues in program evaluation (identification and involvement of the stakeholders, the use of quantitative and qualitative measurements, documentation of process information, and adequacy of the evaluation plan and implementation), and advantages and disadvantages of methodologies that can be used for data gathering. The evaluative questions Richards lists as examples for formative, illuminative, and summative evaluation are primarily focused on "what has happened" rather thn "what we shall do from now." Appendices include two examples of program evaluation (EFL courses in primary schools and language courses in a private language institute) with a focus on the audiences, methodology, and reporting of the evaluation.
Roberts, C. (1998). Language and cultural issues in innovation: The European
dimension. In P. Rea-Dickens & K. P. Germaine (Eds.), Managing evaluation
and innovation in language teaching: Building bridges (pp. 51-77).
London: Longman.
keywords: UK; Europe; university; overview; trends; methods; participatory evaluation; culture; ethnography
annotation: Roberts argues for ethnographic methodologies in evaluating foreign language and culture learning in modern language degree programs at the tertiary level in the UK. In particular, she focuses on study abroad in the final year of the degree, the so-called ‘Language Learners as Ethnographers' project.' The following tools were utilized within this ethnographic framework: interviews with students and lecturers, course diaries, end of course questionnaires, products of the ethnographic project (a written report), meetings with students abroad, classroom observations, joint assessment meetings on the projects and drafts of the project, staff-student discussions, observation of project supervision (field notes were taken during observation and meetings). She illustrates the value of eliciting rich and thick description of the program using an ethnographic approach. She also argues that, via ethnography, evaluation becomes a context-bound process of understanding (p. 75) for educational purposes rather than a set of facts which can straightforwardly predict and replicate other successful projects (pp. 75-76) for accountability purposes.
Ross, S. (1992). Program-defining evaluation in a decade of eclecticism.
In J. C. Alderson & A. Beretta (Eds.), Evaluating second language
education (pp. 167-195). Cambridge: Cambridge University Press.
keywords: Japan; university; EFL; quantitative; audio-lingual; functional-notional; grammar-based; self-access; task-based; materials; classroom observation; testing; checklist; teacher observer
annotation: Ross demonstrates a program-formative evaluation approach applied at a Japanese junior college program, which he characterizes as a laissez-faire English Language Teaching environment. The goal of the study was to generalize the findings to other Japanese second language teaching contexts. He examined how the observed differences in methodological characteristics of five different teaching approaches (audio-lingual, functional-notional, grammar-based, self-access pair learning, task-based) as determined by the materials used in the English as a foreign language courses related to the product/outcome differences. The project utilized teachers as participant observers to reduce the teachers' anxieties toward outsider observations. The researcher created a low-inference coding scheme to analyze classroom activity types (student activities, sources of input to students, student behavior, and the distribution of classroom time). Four observations were done by four different teachers. Each observed activities and behaviors in the four sections of the coding scheme. Their observations were tallied and summed for cluster analysis. Specific hypotheses were then created based on the observations and compared with the outcome measures (grammar test, listening cloze test, partical dictation test, a narrative discourse test, and a structured oral interview test) using analysis of covariance (the pre-test score, self-report of extra-curricular contact with native speakers, attendance rate as covariates). The link between process and product data was found for listening input and the development of listening skills in the post-test, but not for grammar input and pair-work. The quantitative data obtained through observation could only partially reveal methodological features of the instructional setting; thus, Ross notes that more affective and linguistic aspects of language learning are needed.
Ross, S. (2003). A diachoronic coherence model for language programme evaluation.
Language Learning, 53(1), 1-33.
Schneider, A. I. (2000). Title VI funding for undergraduate international
study programs: Long-term impact on language offerings. ADFL Bulletin,
32(1), 42-47.
keywords: Japan; student mastery; achievement; proficiency; learning outcomes; testing; model
annotation: Ross warns that the use of norm-referenced testing can lead to incorrect inferences about program success and students' mastery. However, the use of syllabus-based assessments has been considered insufficient in terms of "hard evidence of generalized proficiency gains" (p. 6). This conflict has not been resolved and "no single approach has been able to assess achievement and proficiency simultaneously" (p. 7). The study analyzes the relationship between program-internal assessment (composite grades of portfolio self, peer, and teacher assessments, as well as syllabus content testing) and program-external assessment, from six cohorts of 1,820 undergraduates in an EAP/EFL program. Standardized proficiency testing (TOEFL) was administered at the beginning and at the end of the year, and achievement testing was undertaken twice per year (approximately every 80 classroom hours). The path analysis was undertaken to reveal the direct and indirect link between achievement and proficiency. The results revealed that listening skills developed independent of classroom instruction, while academic literacy appeared to be more program dependent, requiring greater learner effort. The result in the second instructional year showed an overall weak link between the pre- and post-reading proficiency measures. Also, the program-internal achievement tests had no direct impact on the post-reading proficiency test, but the note taking course did have an impact. These findings led to the reformation of the reading curriculum and better coordination of assessment criteria. The Diachronic Coherence Model, Ross proposes, reveals the strength of the relationships among the internal achievement tests, which are based on the learning outcomes, and between proficiency and achievement tests. This model can respond to both external accountability and internal formative purposes. However, it is limited to a program evaluation based on student outcomes assessment.
Schneider, A. I. (2000). Title VI funding for undergraduate international study programs: Long-term impact on language offerings. ADFL Bulletin, 32(1), 42-47.
keywords: US; university; undergraduate; international study; grant impact; accountability
annotation: Schneider reports on an evaluation study of all U.S. Department of Education Undergraduate International Studies and Foreign Language Programs. The evaluation occurred in response to federal accountability requirements. The focus here is on the language instruction component of the overall evaluation study. A questionnaire was distributed to the 107 funded projects (75% return rate), and site visits were made to 51 of the respondents (64%), in order to find out about the impact of funding. Results indicated not only an overall strong and long-lasting impact on curriculum development and the campus environment, but also an indirect impact on student participation through the establishment of, or an increase in overseas study. It was also found that the programs lacked data management and needed help with systematic collection and analysis of grant impact (e.g., information on enrollment, revised/added courses, methods of instruction).
Schulz, R. A. (2007). The challenge of assessing cultural understanding in the context of foreign language instruction. Foreign Language Annals, 40(1), 9-26.
keywords: Intercultural competence; assessment; student learning outcomes; portfolio; German; university
annotation: Schulz reviews the literature on intercultural competency outcomes and problematizes the inconsistencies of past operationalizations of intercultural learning and teaching in foreign language education. Based on her reviews, she proposes five fundamental objectives for cross-cultural awareness and understanding for pre-collegiate and college introductory level foreign language programs. The five objectives focus on the development of students’ awareness of (1) factors that “impact…cultural perspectives, products, and practices” (p.16), (2) situational factors influencing interaction and behavior, (3) stereotypical views of the home and target culture, (4) culture-specific images and connotations of expressions, and (5) possible causative factors for cultural misunderstandings. To assess process- and product-oriented intercultural learning outcomes, she suggests using portfolio assessment that can be integrated into an existing course. The Appendix includes concrete instructions, tasks, and assessment criteria that are aligned with Schulz’s five fundamental cultural objectives for introductory college-level German courses.
Slimani, A. (1992). Evaluation of classroom interaction. In J. C. Alderson
& A. Beretta (Eds.), Evaluating second language education (pp.
197-221). Cambridge: Cambridge University Press.
keywords: conversation analysis; uptake; checklist; learner perspective
annotation: Silimani investigates whether topicalization (learning opportunities) of English linguistic items in classroom interaction leads to ‘uptake’ by the learners. Here, ‘uptake’ was understood as what individual learners claim to have learned from the interactive classroom events which have just preceded (p. 199). Silimani operationalized ‘uptake’ by distributing an Uptake Recall Chart at the end of the observed lesson, and also an Uptake Identification Probe, which occurred three-hours later.Student interviews were also conducted but did not produce responses that were sufficiently precise to be interpreted in relation to what might account for their claims(p. 205). Findings indicated that learners' perception of topicalization and uptake were more salient when initiated by the learner, suggesting also that learners' perceptions are highly idiosyncratic. The study highlights the potential value of including the learners' perspective regarding what they learned in classroom interaction. Tapping their perspectives through learning-focused measures (like the uptake charts) may provide an important supplement for the interpretation of learning processes and outcomes in evaluation studies.
Snow, M. A., & Brinton, D. M. (1988). Content-based language instruction:
Investigating the effectiveness of the adjunct model. TESOL Quarterly,
22(4), 553-574.
keywords: US; ESP; university; adjunct model; questionnaire; domain expert; content-based
annotation: Snow and Brinton examine the effectiveness of the adjunct model (non-students concurrently taking a general education course and a sheltered language course, both linked by content) in the 7-week Freshman Summer Program (FSP) at UCLA. The language course was developed based on a needs analysis of the content discipline (instructors' feedback, analysis of the language and content materials, review of previous curricula and assignments, and input from other specialists). It was also adjusted throughout the instructional period by holding weekly staff meetings. 79 out of 224 former FSP students responded to a retrospective survey which requested demographic information, global benefits of FSP courses, and the usefulness of specific activities and skills students learned in FSP. Open-ended questions revealed three types of positive attitudes towards FSP (The ease of adjustment, self-confidence, and learning to get help) and constructive comments (follow-up support is needed after FSP, and the program focuses less on natural science but more on social science and humanities). A follow-up study featured interviews with the new graduates of FSP, reflecting on the beneficial effects of FSP in equipping students for the academic demands they face. The authors also compared FSP students and non-FSP students in their performances on the placement exam and an end-of-semester simulated academic task. Despite the FSP students' lower and broader distribution on the placement test, they were able to perform similarly to ESL students in the simulated final examination. However, the incompatibility of the initial and final tests and the difference in timing of the placement test make the conclusion difficult to interpret. Although sampling may be difficult, the study shows how former students can be one important source on program effectiveness.
Spaventa, L., J., & Williamson, J., S. (1991). Participatory placement:
A case study. In M. C. Pennington (Ed.), Building better English language
programs: Perspectives on evaluation in ES (pp. 75-97). Washington,
DC: NAFSA.
keywords: ESL; university; process-oriented; placement test; testing
annotation: Spaventa and Williamson illustrate the process of how a new placement test and associated procedures were created for a 10-week ESL program. The program had encountered several problematic issues in placing students, including: (a) no reflection of students' oral competence by the existing test (Michigan English Language Placement Test, MELPT); (b) a "lack of standardization of evaluative criteria" (p. 80) for oral testing; (c ) excessive time and energy for administering and scoring of tests; and (d) lack of teacher involvement in student placement. To address the issues of scorability, economy, and administrability, a C-test (a text with deletion of one or more letters after the first letter for every other word), an oral placement (a combination of students' self assessment and teacher-student discussion), and a 10-minute writing test were developed for placement decisions. Where needed, the MELPT, which correlated relatively highly with the C-test, was administered for informing potential placement level switches. Spaventa and Williamson summarize their process model of participatory testing, which includes students' and teachers' voice in the placement decision making process.
Sullivan, J. H. (2006). The importance of program evaluation in collegiate foreign language programs. Modern Language Journal, 90(4), 590-593.
keywords: assessment; teacher certification; model; NCATE; university
annotation: As a member of the National Council for the Accreditation of Teacher Education (NCATE) Board of Examiners, Sullivan stresses the importance of “collaboration and collegiality” (p. 592) in conducting and sustaining effective program evaluation. As outlined in the NCATE/ACTFL guidelines for accreditation, programs are expected to create a professional learning community and implement a locally contextualized program evaluation. Sullivan introduces and exemplifies a template for the NCATE/ACTFL six-step approach to evidence gathering on teacher-candidates’ performance. Sullivan also stresses the autonomy and willingness of faculty to take control of departmental self-study as keys to supporting claims about the value of faculty members’ educational efforts.
Tucker, G. R., & Cziko, G. A. (1978). The role of evaluation in bilingual
education. In J. E. Alatis (Ed.), International dimensions of bilingual
education (pp. 111-124). Washington, DC: Georgetown University Press.
keywords: bilingual education; Canada; Nigeria; Philippines; experimental
annotation: The authors highlight three bilingual education (BE) programs in Canada, Nigeria, and the Philippines as examples of evaluation applied to BE. Many bilingual education programs adopted an experimental, comparative approach for evaluation, which may be susceptible to problems with random assignment, teacher effect, and/or uncontrolled/unmeasured factors. In addition, many programs had not articulated locally agreed consensual goals for education in terms of "affective, cognitive, linguistic or social development" (p. 432), which affected the choice of evaluation strategy. Instead of making judgmental decisions about the programs, the authors suggest that evaluators "evaluate the relative strengths and weaknesses of a variety of program alternatives and to specify the conditions under which each might be more or less successful" (p. 433) for knowledge formulation purposes. Conducting a formal evaluation will lead program stakeholders to collaboratively specify, operationalize, and implement program goals and objectives. Thus, evaluation can also be seen as knowledge formation for formative and summative purposes.
The authors introduce the notion of contextually tailored testing, which can be adapted to different contexts and updated as teaching and learning situations evolve.
Weir, C., & Roberts, J. (1994). Evaluation in ELT. Oxford:
Blackwell.
keywords: overview; theoretical; project evaluation, program evaluation, accountability-oriented, development-oriented, formative, summative, external evaluator, internal evaluator, ODA, case study, method, self-report, observation, political dimension, personal dimension; methods; English
annotation: Weir and Roberts provide a good overview of the development of theory and practice in language program evaluation, based primarily on their experiences as internal and external (Overseas Development Administration, ODA) evaluators. Both accountability- and development-oriented dimensions are discussed throughout the book. The book consists of four parts and includes extensive appendices.
Part I (chapter 1 and 2) provides a comprehensive overview of approaches for evaluating second language projects and programs. In chapter 1, the authors review the accountability- and developmental-orientations to formative and summative evaluation. They suggest an integrated approach to these two supposedly conflicting purposes of program evaluation, viewing "evaluation as contributing to understanding and thereby to general professional accountability and development as well as satisfying any contractual accountability requirements" (p. 10). During the planning stage, the purpose (why), the focus (what), the people (who), the duration (how long), timing (when), and the method (how) of evaluation all have to be aligned. The authors also review two sets of standards for educational evaluation by the Joint Committee (1981) and Harlen and Elliott (1982). Chapter 2 focuses on the issue of base-line evaluation design (data gathering at the appraisal stage to determine appropriateness, viability, and sustainability of the project, and at the implementation stage to later determine the effectiveness of the project based on the pre-determined outcomes), with a particular focus on project evaluation (contractual accountability-driven projects funded by ODA).
Part II is a collection of three case studies. Authors reflect on their experiences as external (chapter 3 and 5) evaluators and as insider evaluation recipients (chapter 4). The first case (chapter 3) is an evaluation of a four-week ODA-funded EFL in-service teacher training project in Nepal, following a summative and external paradigm with a 'cost-benefit' approach. The purpose of the evaluation was to examine whether teacher training, which was supposed to affect pedagogical practice, made a difference in students' performance in the School Leaving Certificate English examination. Chapter 4 (the second case study) describes a formative evaluation of a twelve-week pre-sessional EAP program for "understanding, action and improvement" (p. 84). The authors, as insiders carrying out inspection of the program, experienced the tension and differing priorities between developmental and accountability-oriented perspectives to evaluation. They address the need for a synthesized approach, in the form of a utilization-focused evaluation. The third case study (chapter 5) is an evaluation of a two-year ODA-funded professional development program for secondary school teachers at a tertiary institution in Latin America. This is a case of an external evaluator playing a critical role in helping insiders to set up feedback and evaluation structures to inform self-evaluation study. A framework for integrating internal and external perspectives for carrying out development-oriented evaluation is proposed.
Part III (chapter 6-7) explores various discrete evaluation methods, including self-report (interviews and questionnaires) and classroom observation. The authors discuss advantages and limitations of each method, step-by-step guides, and examples. Rather than adhering to one paradigm of enquiry, they caution would-be evaluators to determine methodology by first clarifying: (a) the purpose of evaluation; (b) what information is required for fulfilling the purpose; (c ) the availability and access to resources; and (d) "the characteristics of informants" (p. 132).
Part IV, the final chapter, discusses the political and personal dimensions of evaluation, including issues of power, decision making, fairness, ownership, and climate. The extensive Appendices are useful for understanding criteria and instruments used in the three case studies.
Windham, S. (2008). Redesigning lower-level curricula for learning outcomes: A case study. ADFL Bulletin, 39(2&3), 31-35.
keywords: US; university; German; outcomes; assessment; ACTFL; proficiency; standards; program improvement; case study
annotation: This case study describes the modifications Elon University’s German department made to its curriculum in order to develop a program with more clearly defined and consistent outcomes. The program changes were motivated by an external review which revealed that, although individual courses had specific objectives, the program lacked articulation across and between levels. The program used the ACTFL proficiency guidelines as a basis for the development of learning outcomes for its different levels. The adoption of new student learning outcomes also led to the development of new evaluation criteria, assessment tools, course content, and pedagogical practices. The result has been clear improvement in student performance on oral proficiency assessments. The author emphasizes that the ACTFL guidelines provided an overall framework, but the German department made changes and supplemented the guidelines to meet its local needs. Thus, the ACTFL guidelines were used to inform the discussion on outcomes, but not as the final measure of outcomes.
Winskowski-Jackson, C. (1991). Evaluation of culture components in ESL programs.
In M. C. Pennington (Ed.), Building better English language programs:
Perspectives on evaluation in ESL (pp. 98-134). Washington, DC: NAFSA.
keywords: ESL; culture; acculturation; checklist
annotation: Cultural components are often intertwined with language learning in ESL classes. Winskowski-Jackson provides a variety of checklists (staff, curriculum, and student) for identifying the cultural components in any program. Proposed methods include a survey of the program staff' s cultural competencies, identification of the curriculum infrastructure in terms of cultural content, and assessment of students' cultural competence (measuring as well their cognitive knowledge, affective development, and psychomotor ability). Although cultural competencies may vary by individual history, the author specifies several that can be evaluated over critical periods of development (at initial, three month, six month, one year, and two year stages). The evaluation of effectiveness for a cultural training program can be complex since the degree of acculturation may differ individually.
Wright, B. D. (2006). Learning languages and the language of learning. Modern Language Journal, 90(4), 593-597.
keywords: assessment; model; accreditation; university
annotation: Wright reconceptualizes assessment as a four-step inquiry and improvement process that “goes far beyond mere measurement” (p. 594). In step one, faculty define and clarify student learning outcomes and address questions about learning. In particular, Wright warns that learning outcomes must be specific to the program goals and not bound by assessment practices. In step two, faculty gather evidence on student learning. Evidence gathering decisions must respond to questions raised regarding the specific goals of student learning in step one. In step three, faculty analyze and interpret gathered information. Lastly, in step four, faculty use evidence of learning to improve student learning. Wright emphasizes both the use of findings and the commitment to taking action through assessment, unlike traditional input and output assessment models. Accreditation bodies and policy makers now require information on how well students reach the outcomes and not just what they learn. Finally, the author proposes that faculty actively engage in self-improvement and utilize assessment for meaningful purposes rather than passively going through the routine solely for meeting accreditation standards.
Yang, W. (2009). Evaluation of teacher induction practices in a US university English language program: Towards useful evaluation. Language Teaching Research, 13(1), 77 - 98.
keywords: teacher induction; utilization-focused; useful; program improvement; formative; internal; university; case study; US; EAP
annotation: This case study provides an example of how a utilization-focused approach can help teachers conduct internal evaluations that lead to practical program improvements. Yang describes an internal, formative evaluation of the new teacher induction program at a US university’s EAP program. The evaluation was conducted by Yang and a new teacher, both working within the program, in an effort to improve the induction process for new teachers. Following Patton’s utilization-focused approach, Yang identified the program administrators as the primary-intended users and worked closely with them to design and implement an evaluation that would meet their needs. She explains the utilization-focused evaluation approach and then describes the evaluation process in detail. She illuminates the preliminary negotiations with the primary intended users that led to development of research questions and evaluation design, reports on findings about teacher induction practices, and then discusses how the findings and recommendations were eventually used for the benefit of teachers and the program.