E


Elbow, P., & Belanoff, P. (1991). State University of New York at Stony Brook portfolio-based evaluation program. In Belanoff, P., & Dickson, M. (Eds.). Portfolios: Process and product (pp. 3-16). Portsmouth, NH: Boynton/Cook Publishers.

Elbow and Belanoff describe the use of portfolios as instruments for making pass/fail decisions about freshman university writers. Proficiency tests, used previously for determining whether or not students could be exited (or exempted) from introductory writing courses, consisted of hour-long, one-shot responses to single test-prompts. These proficiency exams do not offer a good idea of what students are really able to do with writing as it most often occurs under real-world circumstances. It also detracts from the writing practices that are stressed in writing instruction. In order to maintain standards of student writing, to remove the onus of pass/fail decisions from individual teachers (i.e., to lend a sense of objectivity to the evaluation process), and to bring classroom practice in line with assessment, SUNY instituted a portfolio system of semester end student evaluation.
The portfolio system consisted of three modes of writing -- a narrative, descriptive, expressive piece or informal essay; an academic essay organized around a main idea; and an academic essay that analyzes another academic essay. Each of these pieces was to be produced during the semester and revised, edited, reworked at will. In addition, one in-class essay was required (in order to provide comparison and maintain academic honesty).
Other teachers were used to rate the submitted portfolio selections (one at mid-term that could be used as a bench mark for students). Rating simply involved pass/fail judgments, actual course grades being determined by cumulative classroom variables (as determined by teachers). Judgments were therefore binary and holistic (with all four pieces being used to determine one decision). If one paper brought about failure, students could revise and resubmit. Consistent standards were maintained through mid-semester and semester end rater calibration sessions using sample pass/fail portfolios.
This evaluation process offers a degree of objectivity and standardization to writing assessment. It also demands a higher quality of effort and work from students (than did the proficiency exams). It moves away from norm-referenced standardization of testing and interpretation of student abilities, and moves towards criterion-referenced and mastery-based evaluation of student products. The actual work that students do for a class becomes the criteria by which they are graded, yet system-wide standards can be maintained through the use of trained class-external raters.
Problems include: more work for teachers as raters, more pressure on teachers to help students succeed, teaching to the few portfolio pieces, too much emphasis on revision (which is too easy on students). Strengths, however, include: reflection of the complexities of writing processes, the value of revision and critique from various perspectives, transformation of teacher into helper as opposed to judge, emphasis on the complexities of writing for unknown audiences.


F


Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27-32.

Frederiksen and Collins propose a new measure of systemic validity for educational tests that are intended to motivate change in instructional systems. An extended notion of construct validity must consider the effects (both intended and unintended) of a test on a system of instruction. "A systemically valid test is one that induces in the educational system curricular and instructional changes that foster the development of the cognitive skills that the test is designed to measure" (p. 27). Frederiksen and Collins find that most high stakes educational testing does not maintain a high degree of systemic validity. Many tests measure a construct that is specified in such a way that "instructional adaptations that do not contribute to the development of these cognitive skills" can be employed to score well on the test. These tests further: emphasize skill components and knowledge of discrete items, develop test-taking skills in lieu of problem-solving or creative skills, displace learning goals with focus on test score improvement, and direct student learning away from use of cognitive skills. For claims at systemic validity, such evolutions in instruction and learning must be accounted for.
Systemically valid tests should be direct tests of cognitive abilities instead of indirect tests of characteristics that can be correlated with learning objectives. Indirect tests do not enable the measurement of higher order thinking skills used in response to real-world tasks. Direct tests are more systemically valid in that instruction that improves test scores will also lead directly to improvement in the corresponding cognitive ability. One good example of this kind of test is the National Assessment of Educational Progress writing exam, which involves examinees in authentic writing situations which are then rated according to primary trait characteristics of the given task. Such primary trait scoring involves raters in subjective scoring of a previously well-defined set of criteria that constitute different levels of performance on the task. Raters have been able to achieve a high degree of reliability in using such primary trait scoring (91%-95%), especially with the use of level exemplars in rater training. Rater training materials and primary trait characteristics are also effective tools for teaching performance to students. Thus the desired product of instruction as well as the process are positively affected by the test.
Components for a systemically valid test include: a set of tasks representative of the domain to be tested, the corresponding primary traits and rating criteria, a set of exemplars of different levels of performance on the given tasks (that is accessible to all involved in the testing process), and a system of training scorers for the tests (including self-scorers, teachers, and master scorers). Tests should be direct, cover a wide scope of task-related characteristics, be reliable (using primary trait scoring), and have transparent criteria. Tests should also encourage self-assessment, be repeated, provide extensive and directed feedback, and include multiple levels of success.
Frederiksen and Collins close with examples for the implementation of systemically valid tests in student assessment, and they suggest that extensive research needs to be conducted into the effectiveness of different ways of carrying out their recommendations.


National Foreign Language Resource Center Homepage

Language