A


Allaei, S. K., & Connor, U. (1991). Using performative assessment instruments with ESL student writers. In Hamp-Lyons, L. (Ed.). Assessing second language writing in academic contexts (pp. 227-240). Norwood, NJ: Ablex Publishing Corporation.

Allaei and Connor suggest that performative assessments of ESL student writing can offer a valuable alternative to large-scale, holistically or analytically scored assessments of L2 writing proficiency. Performative assessments involve examinees in writing in response to a particular task that necessitates a specific type of writing (e.g., persuasion). This writing is then evaluated subjectively according to detailed scoring rubrics. Such tests are therefore criterion-referenced, and they seek to differentiate a variety of skills that are involved in a given writing task performance. Allaei and Connor maintain that these tests have high content validity if the rubrics used for evaluation are based on identification of skills that students actually will be using with respect to given tasks and if the test prompts elicit the same skills. They also claim construct validity for performative writing tests on the basis of this detailed identification of task dimensions along which actual student writing performances vary.
Development of an L2 performative assessment involves: (1) identification of the knowledge and skills that a target audience is expected to command with respect to specific writing tasks, (2) creation of test tasks in which content is supplied and which elicit the desired skills and knowledge, and (3) development of scoring rubrics in terms of the sub-skills required and the constitution of levels of proficiency within each of these sub-skills. (Allaei and Connor give examples of differing scoring rubrics used for several specific writing tasks.)
This kind of performative assessment serves as an effective diagnostic tool, especially in program situations where certain types of writing are emphasized and identifiable. Performative assessments should generally focus on the written representation of content and meaning, but analytic scales can be used where specific points (of grammatical accuracy, for example) need to be assessed. The strength of performative assessments lies in their descriptive depth and potential for feedback to learners and teachers. Their success hinges on development of authentic prompts that "parallel what students are expected to do in academic writing tasks" (p. 239). These should be kept specific in order to facilitate comprehensive scoring rubrics.

Arter, J. A., & Spandel, V. (1992). Using portfolios of student work in instruction and assessment. Educational Measurement: Issues and Practice, 11(1), 36-44.

Portfolios are being implemented in order to: tap student knowledge and capabilities to a greater degree, investigate student learning and production processes, align instructional and testing emphases, examine students functioning in real-life situations, provide continuous developmental feedback, encourage student engagement in and responsibility for learning. Portfolios should be purposeful (with goals and organization -- not just a folder of collected work), involve students in self-reflection (metacognitive analysis of work), employ a clear evaluation criteria schema from the inception, include guidelines for work selection, and encourage student participation in this selection process. Portfolios can be used as assessment devices, as an integrative element of instruction and assessment, and as a story-telling device reflecting student work and growth.
General questions that must be answered before implementing portfolios as assessment devices include:
(1) To what extent is the portfolio representative of student work, growth, and ability?
(2) How and by whom are judgment criteria determined?
(3) What constitutes authentic work and performance, and to what extent is the portfolio reflective of the day-to-day authentic activities that should be occurring in the classroom?
(4) Who interprets the information provided by portfolios, and how are these interpreters trained?
In designing a portfolio system, the following questions should be considered:
(1) Who is responsible for the generation and design (stakeholders, top-down, students)?
(2) What is the purpose of the portfolio? Will it be used for assessment as well as instruction? Will it be used to provide information at the classroom level only, or for large-scale accountability decisions as well?
(3) What are the links to curriculum and instruction? Are criteria addressed in instruction? Are students responsible for and active in the learning process? Is self-reflection implemented in class as well as in the portfolios?
(4) What is the content involved in the portfolio? How, when, and by whom is it chosen?
(5) How is assessment incorporated? What is the goal of the assessment? What are the criteria? Is assessment standardized? Who does the assessing?
(6) Who is responsible for the following: selecting portfolio material, storing the portfolio, accessing the portfolio? Who "owns" the portfolio, and how is this ownership operationalized?
(7) How are teachers trained to implement portfolios, both in instruction and as an assessment tool?
In concluding, Arter and Spandel present a checklist for portfolio design.

Aschbacher, P. A. (1991). Performance assessment: State activity, interest, and concerns. Applied Measurement in Education, 4(4), 275-288.

Aschbacher reports on the use of performance assessment at the state level across the United States. She cites the inability of traditional standardized assessment to provide the necessary information for making informed decisions about system effectiveness, and she refers to the undesirable washback effects that standardized testing has had on tertiary education. Although there exists no single method of alternative assessment, alternative or performances assessments share several common characteristics: (1) they require higher level thinking and problem-solving; (2) the tasks are also worthwhile instructional activities; (3) they involve real-world contexts or simulations; (4) they assess process as well as product; (5) performances standards and criteria are made public prior to assessment. Aschbacher provides a list of states with involvement or interest in performance assessment, and she then reflects on the major state-level concerns with implementing performance assessments.
States are most concerned with the costs of performance assessment. These costs include test development, training of teachers and raters, extensive rating sessions, transportation, public education about changing assessments, and score reporting, among others. Possible solutions might include remote scoring and computer networking to alleviate some of the costs. The logistics of implementing performance assessments also seems to be a big concern. Performance assessments involve large amounts of collected material, special equipment, storage and transportation of tests, and increased time commitment. Technical reliability and validity is also cited as an important issue, especially at the system level where decision-making must be defended to multiple interested parties. Finally, support for implementation is lacking in most states. This is due to the difficulty in comparing traditional inexpensive, easy-to-administer, norm-referenced tests with new alternatives. The public, which demands educational reform, is generally unaware of issues involved in test development. If performance assessment is to be applied beyond the classroom level, then extensive training and public promotion will be necessary. Further changes will also be necessitated, given the continued dominance of standardized forms of assessment with critical gatekeeping functions (e.g., the SAT, GRE, and ACT).


B


Bailey, K. M. (1985). If I had known then what I know now: Performance testing of foreign teaching assistants. In Hauptman, P. C., LeBlanc, R., & Wesche, M. B. (Eds.). Second language performance testing (pp. 153-180). Ottawa: University of Ottawa Press.

Bailey reviews the development of an English language performance test for foreign teaching assistants (FTAs), and she discusses subsequent research and areas for implementing change. Based on alleged problems with the in-class language ability of FTAs, it was decided that a situationally and functionally authentic performance test should be implemented as a means of diagnosing FTA weaknesses. Performance categories were determined through a needs analysis conducted by ESL teachers viewing videos of FTAs in teaching situations. The major categories for the test involved the areas of language proficiency, delivery, and communication of information; these areas were subdivided into 12 rating categories. For the test, the stimulus was a set of subject-specific vocabulary words from which the FTA examinee selected one for purposes of in-depth explanation. Examinees were allowed extensive range in choosing a word or concept (thereby evading score variability induced by knowledge differences). The test task involved explanation of the concept or word to a (role-played) student in an office hours setting. The "student" used questioning strategies to elicit at least five minutes of interaction, and the interaction was videotaped. The examinee was then rated according to the performance categories in a holistic fashion. That is, specific descriptors of level abilities within categories was not provided, rather raters developed subjective agreements according to sample videos, and they normed their ratings on these samples. Raters involved were ESL teachers, FTA trainers, and undergraduate students. Pronunciation proved to be the most important rating factor for all raters. A generalizability theory study found that the rater by examinee category showed the largest amount of variance, suggesting that individual subject differences caused different reactions among the raters.
Further studies were conducted in order to better identify the difficulties encountered by specific groups of FTAs. It was found that students were reacting to more than just linguistic variables in FTA speech.
The performance test instrument was also re-analyzed in a comparison of actual classroom interaction with that interaction elicited by the test. The test and authentic interactions were found to differ in terms of: language tone, audience size and dynamics, turn-taking, physical surroundings, paralinguistic possibilities (e.g., chalkboard, realia), combination of written and spoken modes (in actual classroom interaction).
In retrospect, Bailey suggests that more sociolinguistic aspects should be involved in the test, as purely linguistic measures of proficiency did not correlate highly with student assessments of FTA abilities. In order to further improve the test's ability to assess FTA abilities in the functions and situations in which they must function, Bailey also recommended the following revisions: (1) use a classroom setting with at least six (role-playing) students; (2) expand the functions that are represented in the lengthier test; (3) designate subjectively agreed upon criteria for rating the effectiveness of the FTA performances by using informants from all perspectives (including ESL teachers, undergraduate students, faculty from the FTA's department).
Bailey concludes that, although linguistic competence of a minimal level is certainly a necessary condition for FTA effectiveness, it is not a sufficient one. Other sociolinguistic aspects, as well as situational and functional aspects directly related to authentic performance conditions, must be included in an effective FTA performance test.

Brandt, R. (1987). On assessment in the arts: A conversation with Howard Gardner. Educational Leadership, 45, 30-34.

Brandt and Gardner discuss possibilities for alternative assessment in arts education. In order to get at that which the arts promote in students (personal meaning, emotional content), assessment should occur while they are engaged in authentic tasks, not in artificially constructed testing situations. Ability in an artistic realm cannot be separated from achievement, due to intervening variables like the fact that students have not been in touch with artistic modes of thought, artistic processes, etc. Learners should therefore be assessed after they have had enough exposure to artistic thinking, but in the same conditions under which they learn. In order to measure, or monitor, aesthetic growth, focus should be on three factors: production (creative and individual), linked with perception (learning to see, hear, and discriminate critically) and reflection (self-critique). Portfolios offer an effective medium for exploring and documenting personal aesthetic growth. Gardner suggests:
"What we need in America is for students to get more deeply interested in things, more involved in them, more engaged in wanting to know; to have projects they can get excited about and work on over longer periods of time; to be stimulated to find things out on their own" (p. 33).
Based on his theory of seven kinds of intelligences (mathematical, linguistic, musical, spatial, bodily-kinesthetic, interpersonal, and intrapersonal), Gardner concludes that we should be more flexible in what is assessed and how things are assessed.


National Foreign Language Resource Center Homepage

Language