Computers Scoring STAAR Essays: Is Texas Sacrificing Quality for Efficiency? 

T-E-A logo.

Texas has quietly implemented a major change to the scoring of the State of Texas Assessments of Academic Readiness (STAAR) exams, a move that signals a concerning shift towards machine-driven evaluation and the use of artificial intelligence (AI) in education. 

The Texas Education Agency (TEA) now uses automated scoring engines, powered by natural language processing and machine learning, to grade approximately 75% of student essays. While presented as a solution to address grading challenges resulting from the increased volume of writing assessments in the redesigned STAAR, which now includes essay questions at all grade levels, this change has ignited a fierce debate among educators deeply concerned about transparency, algorithmic bias, and the impact on both writing instruction and student learning. 

TEA’s lack of communication regarding the rollout of this system has frustrated many educators. Some argue that a change of this magnitude necessitates greater communication to build trust and understanding, and potentially a cautiously implemented pilot period to thoroughly evaluate the system’s reliability and potential for unintended consequences. 

Compounding their concerns is the troubling surge in zero-point scores on recent essay sections – a trend that contradicts TEA’s claims of the automated system mirroring human grading. Notably, a similar spike in zero-point scores occurred during the fall round of STAAR testing, with roughly 8 in 10 written responses on the English II End-of-Course exam receiving zero points. This contrasts sharply with the spring iteration, scored only by humans, where roughly a quarter of responses received zeroes in the same subject. While agency officials attribute the difference to seasonal variations in student populations (many students who take STAAR in the fall are “re-testers” taking the test after not meeting grade level on a previous attempt), the discrepancy raises questions about whether Texas’ new automated scoring system is contributing to this alarming trend. 
This discrepancy draws concerning parallels to Ohio’s experiences with similar technology. School districts in Ohio reported irregularities after tests were scored by computers, with a suspiciously large number of students receiving zero points. These issues raise questions about whether Texas’ new automated scoring system is contributing to the spike in low scores among students in our state. 

One of educators’ primary concerns is the potential for automated scoring to exacerbate existing educational inequities. Algorithms, while often presented as neutral, are inherently shaped by the data on which they are trained. If this data reflects existing biases, the system risks perpetuating them. Additionally, the pressure to “teach to the test” could be intensified by automated scoring, as teachers may feel compelled to prioritize techniques and formulaic writing that game the algorithm rather than nurture students’ development of well-rounded writing skills. While TEA distances this scoring system from generative AI platforms like ChatGPT, the underlying technology still raises questions about the potential for unfairness. 

TEA maintains that these automated scoring engines are accurate and efficient, and that human oversight remains in place. They point to practices like cross-checking approximately 25% of computer-scored essays and flagging low-confidence scores for review. However, a more detailed technical report promised by TEA has yet to be released, leaving some educators questioning the agency’s commitment to transparency and accountability. 

Educators, as advocates for students and educational quality, have a responsibility to rigorously question the long-term implications of automated essay scoring. Using machines to assess nuanced human writing risks undermining assessment quality and devaluing writing instruction. This shift could also further incentivize “teaching to the test” over developing student creativity and voice. Additionally, concerns persist about whether these systems can ensure fairness and privacy. Texas’ experiment is facing intense scrutiny and further fueling debates about the role of technology in education, standardized testing, and the state’s A-F accountability system.