NFLRC NetWorks: FREE Instructional Materials and Resources

S

Shaklee, B. D., & Viechnicki, K. J. (1995). A qualitative approach to portfolios: The early assessment for exceptional potential model. Journal for the Education of the Gifted, 18(2), 156-170.

Shaklee and Viechnicki discuss the development of a portfolio assessment approach, specifically applied to the assessment of gifted children. They suggest that portfolio assessment is essentially synonymous with non-traditional assessment. It offers the opportunity to engage students actively in displaying acquired or emerging abilities. Teachers should facilitate and guide the portfolio construction process, but the systematic observation process should be based on authentic interaction and activities. Portfolios should be purposeful collections that use multiple sources of evidence drawn over time from various learning opportunities.
In order to develop their portfolio assessment model, the researchers chose to follow a qualitative strategy of investigation. They were searching for universal indicators of exceptional potential (gifted children), and they chose to involve classroom teachers at every level of the process (from inception to product). Several standards, that parallel quantitative validity criteria, formed the standards required of their methods: credibility, transferability, dependability, and confirmability.
Their final product was a set of identifiers for exceptional potential. Videotape data collection and analysis processes were implemented to accompany the identifiers. Stakeholders who should be involved in the assessment should be teachers, the children themselves, peers, and parents/guardians. By triangulating sources of data, stakeholders, and methodology, the information gathered could be considered credible and confirmable. Final portfolio data included: teacher anecdotal records, systematic teacher observations, parent/guardian home-community survey, samples of student products, peer/self nomination questionnaire, any additional available information. Finally, teacher preparation in using the assessment tools was incorporated in implementation.

Shohamy, E. (1992). Beyond proficiency testing: A diagnostic feedback testing model for assessing foreign language learning. Modern Language Journal, 76(4), 513-21.

Shohamy outlines a rationale for and model of language testing that provides information for multiple kinds of decision-making. She suggests that there exist two kinds of language tests, those that are used in the school context and those that are used in a school-external (or real-world) context. Recently, the proficiency movement has involved tests originally created for external purposes in an effort to drive language education in the school context in a certain direction. The use of existing proficiency testing methods does not necessarily provide the intended washback effects on instruction and learning. This lack of effect is due to several factors, primary among them being the fact that teachers are not involved in the creation of the tests and that the tests provide little valuable feedback on instruction and learning (in terms of the single holistic scores that are generated). In order to implement more positive and effective washback effects, Shohamy discusses the requirements necessary for an effective test.
Such tests should be considered as part of a dynamic educational system. They should: (1) stress both achievement and proficiency assessment (reflecting on the school context as well as the real-world use of language); (2) provide multi-dimensional diagnostic information; (3) connect teaching and learning through effect on instruction; (4) involve teachers; (5) supply both norm-referenced and criterion-referenced information; (6) be based on theories of language learning; and (7) be repeated in order to effect change. Implementation of a corresponding test would involve the following phases: (1) description of curriculum specifications for language learning objectives (including skills, content sources, etc.); (2) development of tests that cover the four language skills from both a school-based criterion perspective and a real-world or authentic use perspective; (3) administration of the test; (4) multiple analyses of the resulting test data based on the various kinds of decision-making that will result; (5) summary and interpretation of the analysis results; (6) reform based on discussion of the results (e.g., new textbooks, redirection of classroom emphasis, etc.).
Shohamy concludes that tests alone should not be used to implement change. Rather, tests should be understood as part of a dynamic system in which they can provide valuable information (if properly utilized) for effecting improvement in instruction and learning.

Shohamy, E. (1995). Performance assessment in language testing. Annual Review of Applied Linguistics, 15, 188-211.

Shohamy reviews the genre of language tests falling under the rubric "performance assessment," which result from a desire for improved methods of: "assess[ing] a more valid construct of what it really means to know a language" (p. 188). PTs follow on the heels of the "communicative era" of the 1970s, and were initially a response to criticisms of traditional testing. Tests needed to be real-life, direct, authentic settings for spontaneous language use. PTs were to "replicate, as much as possible, the type of language used in non-testing situations" (p. 189). Tests needed realistic tasks in context-specific situations. Forms of PTs have been direct tests (real-life contexts), work-samples (realistic but controlled tasks and contexts), simulations (representative role-plays of real-life contexts). The pay-off for PTs comes in the form of predictive validity with respect to future language use situations. Accountability is enhanced by realization of the unavoidability of test-effect, and therefore by inclusion of non-testing procedures and indicators (record reviews, observations, work portfolios, etc.). Other instructional pay-off comes in the form of washback effects of PTs.
Theoretically, PT development draws from the following line of thought: Chomsky (1965 -- competence vs. performance), Hymes (1972 -- communicative competence plus ability for use), Oller (1976 -- unitary language ability factor), Bachman & Palmer (1982 -- grammatical and pragmatic competence, i.e., two language ability factors), Canale & Swain (1980 -- grammatical, sociolinguistic, discourse competencies), Canale (1983 -- addition of strategic competence), Bachman (1990 -- Communicative Language Ability model), McNamara (forthcoming -- knowledge and skills plus other cognitive and affective areas should drive PTs).
PTs generally involve rating scales and expert raters. Problems following from research findings show: discrepancy between level descriptors and rater decisions, excessive influence of accuracy in decision-making process, who is an expert?, what constitutes authentic for whom? (contextual bias), live vs. other, test situation effect, interviewer effect, native speaker as model or expert.
Implementation of PTs should follow: (1) needs analysis (criteria, content, context, task or item pool, experts); (2) nature of instrument (which and how many tasks, duration, frequency, etc.); (3) raters (who and how many); (4) integration of skills with content; (5) student input in selection of content; (6) methods for accountability (self-assessment, portfolio, multiple judgments, etc.).
Performance assessment needs to turn away from trying to define language competence, and focus instead on development of a performance model and valid constructs. The future will be determined by: validation studies, cost/benefit analyses, critical debate among raters (experts) with different backgrounds.

Short, D. J. (1993). Assessing integrated language and content instruction. TESOL Quarterly, 27(4), 627-656.

In an effort to maintain ESL student learning on a par with other, non-ESL students, many programs have moved towards integration of content and language instruction. The integrated context, where students learn to use the language to express their knowledge about content, facilitates transition into non-ESL classes. The problem for assessment of integrated language/content objectives resides in the difficulty of isolating one from the other. Short provides a framework for determining language skill and content mastery. Nonstandardized, alternative assessment (incorporating open-ended questions, portfolios, authentic assessments, performance-based measure) offers a more accurate picture of student knowledge and ability than traditional (short-answer and multiple-choice) assessment. In ESL education, assessment plays a vital gatekeeping role. Therefore, it should be commensurate with academic demands that students will be faced with, and it should correspond to instruction in a direct and obvious way. In order to incorporate alternative assessment techniques into the classroom, objectives should be generated prior to instruction. Assessment should then be built into the lesson plan, should enable students to frequently demonstrate growth, should be varied for different styles, needs, and levels, and should refer to criteria that are expressed in advance. Short provides the following framework for separation and assessment of language and subject-area concepts. Although overlap will occur, by selecting types of assessment that focus on the objectives, confusion can be minimized.
Skills that can be assessed: problem solving, content-area skills, concept comprehension, language use, communication skills, individual behavior, group behavior, and attitude.

Measurement techniques: skill and concept checklists, reading and writing inventories, anecdotal records, teacher observations, self-evaluation, portfolios, performance-based tasks, manipulatives, written work (essays, reports, and projects), oral presentations, and interviews.
Short provides a variety of applications of the matrix created by the alignment of skills to be measured with measurement techniques. Potential assessment activities are paired with plausible objectives. The matrix offers a broad range of possibilities for alternative assessment activities, and it enables control and responsibility for testing to be maintained by teachers and students alike. There is certainly space for addition of other skills or techniques (according to technical advances, for example).

Smit, D., Kolonosky, P., & Seltzer, K. (1991). Implementing a portfolio system. In Belanoff, P., & Dickson, M. (Eds.). Portfolios: Process and product (pp. 46-56). Portsmouth, NH: Boynton/Cook Publishers.

In order to establish uniform standards for passing students from the first two semesters of composition classes, to bring semester-end assessment in line with process writing teaching practice, and to shift the role of writing teachers from that of grade-giver to advocate or coach, Smit et al propose the implementation of a portfolio system of assessment. Four samples of writing from the semester of instruction must be submitted (along with all drafts, notes, comments, and the original assignment) to a second writing instructor for end-of-semester pass/fail decisions. Work accompanying the final draft is included to help the second reader (beyond the class instructor) to see student improvement over the semester and within a single writing process. Second readers rate the four pieces (one of which must have been an in-class assignment) based on: fulfillment of assignment, clarity of purpose in writing, detail appropriateness for audience and purpose, organization, and appropriateness of tone. Pieces should be virtually free of mechanical errors (as they had already been revised multiple times). A mid-semester trial run with one paper resulted in revision of the stress on mechanical errors, in order to de-emphasize the importance of this aspect to the student writing process. Only in cases where mechanical errors are repeated and varied, and where they occur to a large degree should pieces not be passed on a mechanical error basis.
Several students and teachers reacted negatively to the implementation of this portfolio certification for advancement procedure. Problems included: continuing lack of standardization in the reading and rating process (despite combined rater training), importance of mechanical errors in making pass/fail decisions, time required of the teachers (reading and rating on multiple occasions while helping students prepare and revise a minimum of four pieces for submission).
Nonetheless, the portfolio system did seem to produce the following general benefits: a more rigorous approach to student work that indicated satisfactory advancement over the semester, the establishment of minimum standards of classroom productivity, and a shift in the role of (and perception by students of) classroom teachers from adversaries to writing coaches. It also engendered a higher degree of responsibility among instructors, when assigning grades during the course of the semester, to be cautious with all aspects of a student's writing, and to encourage progress according to standards across these aspects.

Sommers, J. (1991). Bringing practice in line with theory. In Belanoff, P., & Dickson, M. (Eds.). Portfolios: Process and product (pp. 153-164). Portsmouth, NH: Boynton/Cook Publishers.

Sommers discusses portfolios as a means of evaluating student efforts and as a framework for response to student development through the writing process. Portfolio use in the writing classroom should lead students to discover writing as a form of learning, and it should engender an understanding of the value of the drafting and revision processes. It encourages students to do their best work at their own pace. However, teacher response to portfolio efforts can be motivated by two different models of the portfolio process.
One model is based on professional portfolios (like artist's) which exhibit a collection of the portfolio owner's best representative work (but not all of the owner's work). Instructors would implement this model for responding to portfolios if they were concerned with maintaining certain standards and adhering to specific criteria for student work. This portfolio grading system encourages high standards of work from students, as students are aware that they will be graded on the several samples that they work up, revise, and present as final drafts. It encourages students to assume responsibility for the finished product of their writing that is to be presented to a viewing public. When, what, and how often to grade become issues in teacher response for this model. Too much emphasis on grades for the portfolio products (or too often) can result in students focusing too much on the grading process instead of the writing process. What grades to assign is also a problem, as student work produced should be of a higher caliber than in traditional assessment-driven instruction. Paper work and time consumption for teachers is substantially increased (due to drafting, revisions, grading, etc.). According to this model, students enter into the drafting and revision process in order to produce a better piece of writing because they know that their grades will be positively affected -- concern with grades motivates improvement.
Another model for portfolio response is of a more holistic variety. This system focuses on the developmental process revealed by the work that emerges from a student's portfolio. All work done by a student over time (one semester, for example) is considered. The portfolio "more closely resembles an archivist's collection of a writer's entire oeuvre" (p. 160). This model leads away from the "myth of improvement" which holds that teachers can somehow engender maturation of a writer through their classroom efforts. It focuses instead on the variety of individual efforts at development that each writer produces. In this writing environment, grades are de-emphasized, and possibly best left to pass/no-pass decisions. Emphasis is on development as a writer, and this is held as a more important value than grades themselves.

Stiggins, R. J. (1987). Design and development of performance assessments. Educational Measurement: Issues and Practice, 6(3), 33-42.

According to Stiggins, performance assessment is "measurement based on observation and judgment" (p. 33). It involves trained raters evaluating key factors in a performance by referring to articulated standards and criteria. Performance assessments at all levels (daily judgments in the classroom to statewide testing to certification testing) rely on the judgmental rating of achievement. An examinee is responsible for providing an original response to the best of their ability. In order to ensure that an examinee's true capabilities are reflected in the judgments, and not bias in rater judgment, care must be taken in the design of the performance instrument. Tests should be "structured, preplanned events designed in advance to provide a decision maker with a specific piece of performance information" (p. 34). The following design process enables the creation of a blueprint for a performance test.
(1) Clarify reason(s) for assessment by: specifying decisions to be made (individual or group needs diagnosis, grading, grouping, selection, certification); determining who will be the decision makers; specifying how the results will be used (ranking examinees versus determining mastery); describing the examinees to be assessed.
(2) Clarify the performance to be evaluated by: specifying the content or skill focus; selecting the type of performance (behavioral process, performance product, etc.); list performance criteria (this is the most crucial step -- it must include a definition and performance continuum of each performance dimension).
(3) Design exercises by: selecting the form (consider the availability of dependable evidence and the importance of the decision to be made based on this evidence); determining the obtrusiveness of the assessment (announced versus not, public or not, surreptitious observation, etc.); determining the amount of evidence needed (consider the importance of the decision, the need for representativeness, and the amount of time available).
(4) Design the performance rating plan by: determining the type of score needed (holistic, rank order, or analytic for diagnosis of particular needs); determining who will rate the performance (teacher, professional, expert, self, or peer rating); and clarifying the score recording method to be used (checklist, scale, anecdotes, portfolio, etc.).
The article includes a blueprint, self-test, and sample analytic scales for writing assessment)

Stiggins, R. J. (1988). Revitalizing classroom assessment: The highest instructional priority. Phi Delta Kappan, 69, 363-368.

Stiggins holds that educators should be most concerned about the daily classroom assessments that teachers engage in, and that efforts at assessment reform should therefore be refocused away from national norms and standards. Research indicates that teachers are involved in assessment-like activities every day and that they devote as much as 20% to 30% of their work time to assessment of one kind or another. As teachers generally tend to rely on self-developed assessment activities, and as they devote so much time to assessment, teacher training would be expected to involve extensive relevant preparation. In fact, it does not. Teachers are therefore generally unprepared for and uncomfortable with implementation of the kind of quality assessments that are being called for to meet instructional objectives (like the teaching of higher order thinking skills as opposed to the memorization of finite knowledge). Stiggins maintains, "Two keys to the success of this kind of judgment-based assessment are developing clear, explicit performance criteria and using systematic procedures for rating performances" (p. 365). However, in order to prepare teachers to meet these challenges, changes need to be introduced into the current educational assessment system. Expertise in classroom assessment must be introduce into the schools, priorities, and procedures in teacher in-service training should be altered to address classroom assessment, and teachers should receive adequate preparation in classroom assessment methodology prior to entering the classroom. Teachers should understand the relationship between curricular goals and assessment activities, should receive technical assistance in assessment implementation, and should generally be prepared to engage in: diagnosis, placement, achievement, evaluation, motivation, feedback, and decision-making.

T

Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123-143.

Tedick investigates the differences in writing produced by ESL graduate students when faced with general subject-matter prompts and field-specific prompts. Due to the fact that important admission and placement decisions are often based on writing assessments, creation of effective and fair topics is imperative. Research in psychology has demonstrated that differences in prior knowledge and experience affects comprehension and an individual's ability to construct meaning from an event (a test prompt, for example) and, therefore, to write well about it. Validity of various writing tests and prompts is endangered by the issue of examinee familiarity. One approach has been to provide general topics that assume no background knowledge or familiarity, in order to equate chances for success. Tedick seeks to show the effect of subject matter knowledge on elicited writing performance.
International graduate students were faced with two prompts, one general (arguing the relationship between progress and laziness) and one specific (debating a controversial issue from the examinee's field of study). The resulting essays were given holistic ratings, measured for length, and measured for syntactic complexity (mean number and length of T-units and error-free T-units). Results indicated that examinees tend to write more about topics with which they are familiar and tend to be scored higher in writing about topics with which they are familiar. More advanced examinees tend to produce significantly longer and syntactically more complex T-units on field-specific prompts (although this was not the case for lower level examinees). "Because the field-specific topic encouraged subjects to produce more syntactically complex utterances, it may have the power of eliciting a more accurate picture of the subjects underlying proficiency" (p. 137). Quite possibly, learners are more willing to take risks, and to show what they can do with the language, when the writing task is more familiar (and perhaps more meaningful) to them. The generation of topics that allow examinees to tap subject matter knowledge seems imperative, in order to facilitate an accurate picture of their writing ability and to lead to accurate judgments about that ability. The author suggests that future research should look into more detailed and varied descriptive analysis of writing in response to various prompts, as well as give consideration to the effect of different subject matter interpretations on test makers, takers, and raters.

National Foreign Language Resource Center Homepage