Testing Experts Say It’s a Serious Mistake to Make Standardized Test Scores a Dominant Factor in Teacher Evaluation

Even as the craze for judging teachers by their students’ standardized test scores reaches new heights, now comes a sharply worded report from a distinguished group of testing experts who all agree that test-driven teacher evaluation is a really bad idea. Their new study, published by the Economic Policy Institute, explains in detail what’s wrong with the seemingly simple and hence seductive idea that a teacher’s impact on student learning can be measured by results on standardized tests.

The testing experts offer not only a forceful critique of the misuse of standardized test scores in teacher evaluation but also a constructive alternative. While they believe “legislatures should avoid imposing mandated solutions to the complex problem of identifying more and less effective teachers,” the authors say school districts should be encouraged to experiment and “professional associations should assume greater responsibility for developing standards of evaluation that districts can use.” They note that valuable work has been done already over the past 20 years in developing standards-based evaluations, such as the rigorous performance assessments conducted by the National Board for Professional Teaching Standards. “Such work,” they say, “should not be pre-empted by political institutions acting without evidence” and “distorting the entire instructional program by imposing a flawed system of standardized quantification of teacher quality.”

EPI’s summary of the scholars’ findings deserves to be quoted at length:

“Student test scores are not reliable indicators of teacher effectiveness, even with the addition of value-added modeling (VAM), a new Economic Policy Institute report by leading testing experts finds. Though VAM methods have allowed for more sophisticated comparisons of teachers than were possible in the past, they are still inaccurate, so test scores should not dominate the information used by school officials in making high-stakes decisions about the evaluation, discipline and compensation of teachers.

“The Obama administration has encouraged states to adopt laws that use student test scores as a significant component in evaluating teachers, and a number of states have done so already. The Los Angeles Times recently used value-added methods to evaluate teachers in the Los Angeles Unified School District based on the test scores of their students, and Secretary of Education Arne Duncan supported the paper’s decision to publicly release this information, asserting that parents have a right to know how effective their teachers are. But the conclusions of the expert co-authors of this report suggest that neither parents nor anyone else should believe that the Los Angeles Times analysis actually identifies which teachers are effective or ineffective in teaching children because the methods are incapable of doing so fairly and accurately.

“The distinguished authors of EPI’s report, Problems with the Use of Student Test Scores to Evaluate Teachers, include four former presidents of the American Educational Research Association; two former presidents of the National Council on Measurement in Education; the current and two former chairs of the Board of Testing and Assessment of the National Research Council of the National Academy of Sciences; the president-elect of the Association for Public Policy Analysis and Management; the former director of the Educational Testing Service’s Policy Information Center and a former associate director of the National Assessment of Educational Progress; a former assistant U.S. Secretary of Education; a former and current member of the National Assessment Governing Board; and the current vice-president, a former president, and three other members of the National Academy of Education.

“The co-authors make clear that the accuracy and reliability of analyses of student test scores, even in their most sophisticated form, is highly problematic for high stakes decisions regarding teachers . Consequently, policymakers and all stakeholders in education should rethink this new emphasis on the centrality of test scores for holding teachers accountable.

“Analyses of VAM results show that they are often unstable across time, classes and tests; thus, test scores, even with the addition of VAM, are not accurate indicators of teacher effectiveness. Student test scores, even with VAM, cannot fully account for the wide range of factors that influence student learning, particularly the backgrounds of students, school supports and the effects of summer learning loss. As a result, teachers who teach students with the greatest educational needs appear to be less effective than they are. Furthermore, VAM does not take into account nonrandom sorting of teachers to students across schools and students to teachers within schools.

“There are further negative consequences of using test scores to evaluate teacher performance. Teachers who are rewarded on the basis of their students’ test scores have an incentive to ‘teach to the test,’ which narrows the curriculum not just between subject areas, but also within subject areas. Furthermore, creating a system in which teachers are, in effect, competing with each other can reduce the incentive to collaborate within schools–and studies have shown that better schools are marked by teaching staffs that work together. Finally, judging teachers based on test scores that do not genuinely assess students’ progress can demoralize teachers, encouraging them to leave the teaching field.

“Evaluating teachers accurately is an extremely important piece of the effort to improve America’s schools, and VAM methods are appealing in that they seem to offer an objective and simplified way of comparing one teacher with another. However, as EPI’s report makes clear, ‘There is simply no shortcut to the identification and removal of ineffective teachers.’ The authors conclude that, ‘Although standardized test scores of students are one piece of information that school leaders may use to make judgments about teacher effectiveness, test scores should be only a small part of an overall comprehensive evaluation.’”