No entries for "K"
LeMahieu, F. G., Eresh, J. T., & Wallace, R. C. Jr. (1992). Using student portfolios for a public accounting. The School Administrator, 49(11), 8-15.
LeMahieu et al report on the PROPEL project, a collaborative effort between Pittsburgh Public Schools, Harvard University, and ETS, which had the goal of developing and implementing more authentic assessments in the public schools. The project sought to establish "a resonance between the forms of assessment and the curricular and pedagogical approaches that we want to support in the classroom" (p. 9). The procedure that was chosen involved the use of portfolios as instruments for measuring student accomplishment, growth, and development.
Linn, R., Baker, E. L., & Dunbar, S. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20 (8), 15-21.
Linn et al offer eight evaluative criteria for alternative assessment approaches, i.e., "performance" or "complex" assessment. In order to adequately deal with assessment of complex learning and processes, there exists the need to move beyond the well-established, traditional (technical psychometric) components: efficiency, reliability, and comparability. When evaluation emphasizes these characteristics, performance assessments compare unfavorably. However, along with alternative assessments comes an alternative set of values for evaluative procedures. Traditional components certainly still play a roll, although perhaps not as the primary criteria for judging the quality and usefulness of an assessment. They should, however, still be included, especially in their highly refined state involving new techniques like Item Response Theory and Generalizability Theory.
In writing classes, students reflect on their progress throughout the year by applying a set of selection criteria in order to choose which pieces to use as representative for their portfolios. They must select pieces from the following categories: important, satisfying, unsatisfying, biographical, free pick, and a negotiated free pick. Students should be well aware of the judgment criteria as they pick, and they should reflect on and rationalize their selections. Assessment along these lines: focuses learner attention, facilitates practice and revision processes, encourages collaboration and consultation with peers and teachers, offers the chance for self-evaluation, and is in itself a meaningful and interesting endeavor.
For the purposes of public accountability, portfolios can be judged using an interpretive and evaluative framework. An assessment scheme for writing was developed in the PROPEL project, focusing on three major dimensions: accomplishment; use of resources, processes, and strategies; and engagement, growth, and development as a writer. Evaluators employed a six-point scale for judging the various categories for each sample. Raters were trained in the use of the scheme with example "benchmark" portfolios, then rated a large and representative sample of portfolios from across the school system. Rater agreement was a very high 92% (the manner of calculation was not reported). Additional corroboration was sought from external auditors, including a variety of community members, who were asked to challenge and critique the system. Auditors' concerns centered on the standards that had been developed for the evaluative approach, the disciplined application of these standards, and the mechanics of scoring and reporting results.
LeMahieu et al conclude that the process is well worth the effort, given the positive washback effects (on instruction, teaching, and learning), which do not issue equally from traditional achievement testing.
The eight suggested alternative evaluation criteria produce the following "framework that is consistent with both current theoretical understanding of validity and with the nature and potential uses of new forms of assessment" (p. 16). These critical issues for the validity of performance-based assessment are:
(1) Consequences -- investigate the influences of the assessment in all directions, including program washback effects, intended versus unintended effects on subsequent student learning, intended versus unintended influence on teaching decisions.
(2) Fairness -- in terms of access, content, familiarity, exposure, cultural bias, rater bias, rating criteria bias, range of topic, and method domains.
(3) Transfer and Generalizability -- having fewer tasks on performance assessments increases the variability due to task, and not due to examinee performance; this can be compensated for by increasing the number of assessments or by using matrix-sampling and cross-comparisons of different tasks and different students. Task specifications and transferability from a single task domain to broader, real-world domains should be evidenced.
(4) Cognitive Complexity -- Performance assessment emphasizes problem solving, reasoning, and critical thinking tasks. But these emphases create problems for the establishment of judgment criteria for what constitutes equitable performances on tasks and demands extensive analysis of problem solving approaches for different tasks.
(5) Content Quality -- Use of subject matter experts for the selection and design of tasks and scoring criteria is essential.
(6) Content coverage -- Breadth of tested content must be maintained to avoid unintended washback effects on teachers and students, i.e., teaching to a limited content domain found on a given test.
(7) Meaningfulness -- Tests should be educational experiences with meaningful problems; otherwise they may become problematic for teacher and student understanding and motivation.
(8) Cost and Efficiency -- Is it worth all the effort for the probable effects?