By John Kuhn.

With the debate over testing roiling Congress and state capitals nationwide, it is important to recognize the damage done to American pedagogy by high-stakes testing and the deleterious effects of punitive accountability on the students who depend on public schools.

In The Big Idea of School Accountability, their slick apologia for high stakes testing and punitive accountability, both of which have dominated American education politics and pedagogy since the 1980s, Bill McKenzie and Sandy Kress start out on the high road.  McKenzie is a high-ranking opinion-shaper at the George W. Bush Institute and a former editorialist for the Dallas Morning News. Kress was an architect of Bush’s No Child Left Behind Act of 2001 and, though he leaves this out of his bio attached to the essay, a long-time lobbyist for Pearson, the world’s leading vendor of K-12 standardized tests. The two edu-lobbyists begin their essay by mentioning historical moments in education policymaking and politics that would seem to appeal to a wide audience. They condemn segregation and celebrate Brown v. Board of Education. They praise the Elementary and Secondary School Act of 1965 (later renamed ESEA) and they celebrate its noble intention that “schools in disadvantaged communities would receive the resources to provide their students a decent education.”

Pay close attention to that statement, because it is the last time the authors will refer to resources as a necessary element to ensuring quality education in disadvantaged communities. Through the sleight of hand that has been perfected by the modern education reformer—and McKenzie and Kress are education reformers of the highest order—the writers deftly pivot from any and all talk of the need to provide equitable educational resources across all communities so that schools in even the poorest areas can deliver on the promise of education, and they spend the remaining pages of their article discussing something much easier on the taxpayer’s pocketbook: accountability, or the careful creation of just the right punishments to make teachers and students succeed in making learning happen, without respect to the pesky details of resources available to them (or unavailable to them, as they case may be). In the next paragraph—without establishing that the equity of resources LBJ’s law intended to guarantee was ever successfully attained—the writers begin to speak of campuses being “held responsible,” of the need to “hold schools accountable” and of “what should happen if schools do not show progress for all their students.”

Pivot complete.

The authors have shifted totally from an inconvenient conversation about fair and equitable investment in children and communities—investment that is adequate and comparable regardless of a student’s zip code or skin color—to one about holding children and communities responsible for their own outcomes. Accountability is constructed on the principle of blame and consequences as leverage to move schools and kids forward (blame and consequences, it should be noted, entirely directed at the teachers and students, with no consequence whatsoever reserved for citizens outside the schoolhouse who may or may not provide adequate fiscal supports for schools and children). At the urging of testing advocates like the authors of this essay, educational improvement via punitive test-based policies has eclipsed humane concepts of shared assistance and support for hurting American children (particularly anything resembling the investment of tax receipts) as the “civil rights issue of our time.” Educational accountability is designed as a low-cost replacement for social responsibility.

Children in America’s poorest neighborhoods lack all manner of opportunities and resources from birth that many American families take for granted. This isn’t to say they can’t learn. Of course they can learn, but there are obstacles they must overcome that society has kindly ensured do not litter the path of many other children from middle and upper class areas. From birth weight forward, all the data in impoverished zones is stacked against children, and elevating accountability for schools as our primary lever for improving these children’s lives has the effect of squelching any urgency and attention directed at efforts to feed and clothe and love and help them outside the school. We hear reformers speak of “the fierce urgency of the now” when they speak of improving schools, but we never hear it when they speak of improving lives. More than anyone in the United States, the poor child needs a hand up. More than any organization in the United States, the public school—the place where our children gather, and where they come as they are—needs support.

Just as a critical care hospital costs more to operate than a walk-in clinic, so too does a school in a troubled neighborhood require extra resources to remediate problems children bring with them, critical issues that interfere with a successful education. But in America we regularly fund schools in zones of economic and social turmoil at significantly less than the funding rate of schools in our wealthy suburbs, where a larger proportion of students come to school well-prepared.

Nowhere in the voluminous writings of Bill McKenzie or Sandy Kress will one find mention of the tragic fact that the average “Unacceptable” school in Texas’s accountability system in 2012 received $1000 less in school funding than the average “Exemplary” school. While testing advocates were busy beating up schools for not jumping high enough, the state funding system was busy hobbling them. Here is the school funding data from the TEA, broken down by accountability level:

Exemplary districts: $6,580 per weighted pupil

Recognized districts: $5,751 per weighted pupil

Acceptable districts: $5,662 per weighted pupil

Unacceptable districts: $5,538 per weighted pupil (1).

Folks in the accountability camp like to say “we can’t throw money at the problem.” In fact, they apparently prefer actively pulling money away from the areas where the greatest problems exist. It is lunacy to believe that a testing program can do anything to help children who are being denied the same educational resources provided to their peers in wealthy communities. And to compare those under-funded students with their better-funded peers is nothing short of cruelty.

Thanks to the shift in focus toward testing and away from resourcing, in 2015 Texas sank to 49th in the nation in school funding (2). The accountability clique convinced lawmakers that funding was of little import; academic success could be forced upon children at a discount via test-based coercion and threats. An analogy might be if the Good Samaritan in the Bible story had stopped beside the injured traveler and, instead of lifting him out of the dirt and paying for his recuperation at an inn, had stood over him with a stopwatch and told him to hurry and get up, and assured him that he was comparing his time with that of uninjured people.

After quickly dismissing the topic of equitable funding for schools in poor areas, the authors praise bipartisanship and claim the legacy of Lyndon B. Johnson and Robert Kennedy as they discuss the debate surrounding ESEA. But even as they quote LBJ opining how his signature education law meant more “to the future of America” than anything he had signed before, they neglect to mention that what they are advocating in this essay—the continuation of required annual standardized testing in grades 3-8 and once in high school and significant punishments for schools, teachers, and students based on said tests’ results—were nowhere in LBJ’s bill.

The authors next note that the NAEP—a test given to a random sample of American students periodically, known as the nation’s report card—showed gains in the 1970s as desegregation took hold. A couple of paragraphs after acknowledging the gains in education that happened without nationwide standardized testing in place, the authors attempt to use NAEP results to prove that the test-and-punish policies of George W. Bush’s No Child Left Behind Act (the law that made standardized tests mandatory nationwide and made test-based punishments for schools the norm) “worked.”

In a table titled “Mathematics Proficiency in Texas by Race,” the authors show grade 8 results on the NAEP math test for white, black, and Hispanic students in Texas from 1990 through 2013. They don’t explain their reason for leaving out other states where NAEP was given, or for leaving out NAEP results from 1971-1990, or for leaving out other grade levels, or for leaving out results from the NAEP reading test. This carefully-crafted chart is curious, particularly when one chooses to look at it in the context of other NAEP data.

The misuse of NAEP scores to justify education policy preferences has become so common in recent years that it has been given its own name: misnaepery. The term was coined by researcher Steven M. Glazerman, who said, “It’s clearly not NAEP’s fault people misuse it, but it happens often enough that I feel compelled to call [such instances] ‘misnaepery’” (3). Education journalist Stephen Sawchuck notes that “Results from the venerable exam are frequently pressed into service to bolster claims about the effect that policies, from test-based accountability to collective bargaining to specific reading and math interventions, have had on student achievement. While those assertions are compelling…they are also mostly speculative….the exam’s technical properties make it difficult to use NAEP data to prove cause-and-effect claims about specific policies” (3).

The graphic in the McKenzie and Kress essay apparently is intended to prove that annual tests and pressure on districts to raise test scores caused improved performance on NAEP for Texas students. For the tiny subset of NAEP results the authors of this essay chose to share here, the percentage of students scoring “Proficient” rose during that 23-year time span by 33 percentage points (white), 19 percentage points (black), and 25 percentage points (Hispanic).

As wonderful as those cherry-picked results are, a quick glance at other NAEP data tells a less tidy story. For example, below is a chart showing long-term trends on the NAEP Reading for all kids in America from 1972 to 2012, by age level.



The overall gains on NAEP over time are much less dramatic than what was seen in the hand-picked NAEP data from the authors’ report. As you can see, 17-year-olds’ average NAEP reading scores rose a whopping two points from 1971 to 2012. While reading NAEP scores for 13- and 9-year-olds rose eight points and 13 points, respectively, during the same time span, a quick glance reveals that the 13-year-olds’ scores rose five points between 1972 and 1992, long before the No Child Left Behind Act of 2001 was the law of the land, and the 9-year-olds’ scores saw a gain of seven points from 1971 to 1980. In fact, if we use the nearest available year to 2001 as the starting point (1999), reading scores dropped 1 point for 17-year-olds under No Child Left Behind, rose only four points for 13-year-olds, and rose nine points for 9-year-olds. In other words, for most age-groups NAEP reading gains before NCLB were larger than the gains after NCLB. Using NAEP results to justify test-and-punish policies simply doesn’t wash.

A similar picture emerges with NAEP math scores.



The NAEP math results for 17-year-olds rose two points from 1973 to 2012, and they actually dropped two points from the last result before No Child Left Behind became law until 2012. Scores for 13-year-olds and 9-year-olds rose more during the time span (19 points for 13-year-olds and 25 points for 9-year-olds), but again, over half the rise in scores came before test-and-punish accountability policies became the norm across the nation. For 13-year-olds, NAEP math scores rose 10 points from 1973 to 1999 and only nine points after 1999, and for 9-year-olds, NAEP math scores rose 13 points from 1973 to 1999 and only 12 points after 1999. Scores did indeed continue to rise at a similar pace during the NCLB era, but the notion that student learning was static until standardized testing came along and jolted schools into action is simply not true.

McKenzie and Kress next turn their attention to the bipartisan origin of the No Child Left Behind Act. They mention the oft-quoted “soft bigotry of low expectations” talking point that President Bush used to great effect in selling the idea of test-based accountability to Congress, the media, and the nation. The idea behind that phrase is that poor children in poor schools nestled in poor communities are performing poorly because their teachers don’t have high expectations for them. No testing advocate ever talks about the harder bigotry of inequitable school funding. It is deemed not worth mentioning that these poor schools are usually funded by a formula based on property values, and that property values in these neighborhoods are low, so that, as a result, these schools often have dramatically less operating funds than schools in affluent areas (where expectations may be higher, but so are school budgets, and you can’t just choose one or the other as the causal factor you prefer to attribute student performance to).

The primary cause of poor student performance, it is argued by proponents of test-centric education policies, is the low expectations of their teachers. A complex socio-politico-economic tragedy has been boiled down to a simple “bad schools” narrative, with a single villain—the classroom teacher. The inequitable distribution of fiscal resources in our society has been purged from the dialogue about schools and learning. Neither tax receipts nor job opportunities nor incarceration rates nor poverty levels nor prenatal care is, in the context of education reform, permitted to be brought up in defense of schools and teachers doing difficult work in the poorest areas: these ideas are lumped together and called “excuses,” and they are no longer welcome in discussions about how to improve educational outcomes for poor children in America.

And so accountability was built on this premise: poverty doesn’t impede academic performance if a teacher works really, really hard. Inequitable school funding doesn’t matter, either. Poor children are capable of everything that rich children are capable of, and the idea that poor children should enjoy circumstances comparable to those of rich children in order to attain similar results as those obtained by rich children was and is a forbidden thought. Only people who believe poor and minority children to be incapable or inferior would insist that they require somewhat equitable situations in order to reach equal outcomes. The only circumstances up for critique are those that exist in the classroom.

What matters is rooting out the bad schools, and shuttering them. What matters is rooting out the bad teacher and firing him or her. Tests are necessary as the dowsing device for finding these bad teachers and schools. In fact, the authors sing the praises of localities that “evaluate the job performance of teachers” by using “student data” (which means standardized test scores).

The use of standardized test scores to evaluate teachers has been decried by a number of prominent research and statistical organizations, even when those scores are run through a complex algorithm intended to isolate the teacher’s impact on the score from other factors (known as a “value-added measure” or “VAM”). But that doesn’t matter to these authors. It doesn’t matter to them that a study commissioned by standardized-test-maker ETS concluded that formulas that attempt to capture a teacher’s impact on a student’s test score fail because “the evidence strongly suggests that these scores…measure not only how well teachers teach, but also whom and where they teach” (emphasis in original) (4). If you are to be a teacher whose professional aptitude is gauged by your students’ test score gains, it would behoove you to teach bright students, and preferably in a wealthy school district. And yet we know that students are not evenly distributed to teachers even on the same campus—some teachers handle challenging students well and are therefore assigned a greater share of potentially low test-scorers. The author of the ETS study goes on to conclude that “Teacher VAM scores should emphatically not be included as a substantial factor with a fixed weight in consequential teacher personnel decisions” because said scores “may be systematically biased for some teachers and against others (emphasis in original) (4). The American Statistical Association issued a statement regarding the use of value-added models (VAMs) to judge teachers, in which the organization recommended caution and warned that evaluating teachers based on formulas tied to student test scores might mean “more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students” (5). When it comes to students’ results on the tests being held up as a reflection of teacher quality the report had this to say: “The majority of the variation in test scores is attributable to factors outside the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences” (5).

Using student test scores for high-stakes purposes (whether firing teachers, closing schools, denying students diplomas, or, as was proposed in the legislature of one Bible Belt state, denying food stamps to students’ families) is a dangerous game that ignores both unintended consequences (such as teaching to the test, narrowing the curriculum, reducing time for activities like physical education and the arts, cheating scandals, the growth in test contract costs, and the growth in time demands for testing) as well as statistical best practices.

McKenzie and Kress return to NAEP scores in the section entitled “The Difference Accountability Has Made.” They look at the NAEP results of 9-year-olds in math and reading and continue to employ ‘misnaepery’ to make the points they need to make. They note that NAEP results showed improvement for black and Hispanic 9-year-olds from 1999 to 2008, but they fail to mention a drop in Hispanic scores from 2008 to 2010 (see the light blue line in the chart below)—which makes one wonder if they chose 2008 rather than 2010 as the endpoint for that particular data-supported rhetorical point so that the gains they claimed would be higher. They also failed to acknowledge the significant gains for black and Hispanic students from pre-1980 to 1990, gains that rivaled those they noted during the NCLB era but could in no way be attributed to their policy preferences. What is most amazing is that they failed to mention these facts from the data even though they embedded charts right beside this paragraph clearly showing the pre-NCLB gains. See below:



While McKenzie and Kress are eager to “prove” that test-and-punish policies resulted in rising NAEP scores, the truth is that NAEP scores have been generally steadily rising since NAEP was first given. There does not appear to be a significant impact on the trend line of NAEP scores that is attributable to the costly approach (in terms of funding and student learning time lost) of giving every child in the nation a standardized test in reading and math in grades 3-8 and once in high school. For such a contentious and expensive policy, one would expect concrete evidence of efficacy, but none exists.

In seeking support for their point of view, McKenzie and Kress borrow a quote from researchers David Figlio and Susannah Loeb indicating “positive effects of the accountability movement.” Other quotes from these same researchers that McKenzie and Kress chose not to use include “For several reasons, however, school accountability systems might not generate higher achievement,” and “educators could teach very narrowly to the specific material covered on the tests, and little or no generalizable learning outside of that covered on the test would take place,” as well as “it could be that one of the major assumptions underlying standalone accountability programs—namely that teachers and schools are underperforming because of insufficient monitoring of their behavior—is incorrect” (6). Figlio and Loeb go on to acknowledge that “despite the theoretical prediction that school accountability systems will improve student achievement…such gains are not a foregone conclusion” (6). Figlio and Loeb are much more reserved in their support for test-based accountability than McKenzie and Kress let on in their essay, noting that district-specific, state-specific, and national studies “provide some evidence of a positive relationship between accountability and student achievement, though they are not universal in this conclusion,” and that “even though high-stakes tests may be associated with gains in math scores at the fourth-grade level, they may not be associated with gains as students progress from fourth grade to eighth grade and, hence, as the students confront more challenging material” (6).

The authors of this essay also ignore a great deal of research that comes to conclusions quite opposite of the quotation they provide from Figlio and Loeb. For example, in a report called “Tracking Achievement Gaps and Assessing the Impact of NCLB on the Gaps” published by Harvard University’s Civil Rights Project, Jaekyung Lee finds the following: “NCLB did not have a significant impact on improving reading and math achievement across the nation and states. Based on the NAEP results, the national average achievement remains flat in reading and grows at the same pace in math after NCLB than before.” Also, Lee notes that “NCLB has not helped the nation and states significantly narrow the achievement gap. The racial and socioeconomic achievement gap in the NAEP reading and math achievement persists after NCLB.” Lee cautions that “If we continue the current policy course, academic proficiency is unlikely to improve significantly” and that we may end up “shortchanging our children and encouraging more investment into a failed test-driven accountability reform policy.” Worst of all, in the words of Lee, the “problem can be more serious for schools that serve predominantly disadvantaged minority students. NCLB has shortchanged those schools with under-funded mandates and an over reliance on sanctions rather than a focus on capacity building” (7).

McKenzie and Kress eventually get around to acknowledging the current political reality surrounding high-stakes tests in our country in a section they call “Accountability in Retreat.” High-stakes testing and punitive accountability are extremely unpopular with educators, students, and parents, and grass roots efforts have sprung up around the nation to oppose these policies. In Texas, an organization called Texans Advocating for Meaningful Student Assessment (TAMSA) was nicknamed Mothers against Drunk Testing by Austin officials and was largely responsible for a major rollback of testing requirements. In New York State, tens of thousands of students simply refused to take the tests. Stories of teachers leaving the profession due to the overemphasis on test results and the under-emphasis on learning, exploration, and joy have become a staple of education blogs. A state commissioner of education said that standardized tests had become “a perversion of their original intent” (8) and that test corporations (such as the one Kress served as a lobbyist) had developed into a kind of “military-industrial complex” (9).

“School accountability is losing momentum,” they lament, and they offer a few reasons. None of the reasons offered is that it hasn’t lived up to its hype of getting 100% of our students to be proficient on standardized tests by 2014, or that testing has become the center of the schooling universe, or that the costs of the tests have risen to heights unimaginable, or that kids hate hours-long standardized tests that are solely used for punishing teachers and schools and serve no pedagogical or diagnostic purpose of benefit to the student. Nope, none of that.

First, they argue, getting results is hard and supporters of accountability are tired. Second, teachers wimped out because the bar kept getting higher. Third, local school districts overdid testing. Full stop: these two men have spent their careers arguing that the amount of testing called for by state and federal lawmakers—whether the annual testing of all 3rd through 8th graders in multiple subjects or the 15 (fifteen!) graduation-required STAAR End-of-Course tests once required of Texas high schoolers—was always appropriate. But district testing, that’s a different story. (I will pause to note here, as these authors blame local districts for too much benchmark testing, that benchmark testing every six weeks is required by my state as part of the improvement protocol when a school is labeled low-performing. I know this because my high school math scores were too low one year and we had to benchmark test all of our math students each six weeks and send the state an Excel spreadsheet of the benchmark scores next to the students’ six weeks grades, to show whether our teachers’ assigned report card grades matched the students’ benchmark scores.)

Do districts and campuses sometimes benchmark too much? Yes. Why is that? The same reason they sometimes narrow the curriculum and teach to the test—because the consequences levied for poor scores on the standardized tests that McKenzie and Kress love so much are so overwrought and over-punitive that school leaders and teachers fear for their livelihoods if their students do poorly. As such, they give frequent benchmarks to attempt to determine if students are on track to pass the standardized tests that McKenzie and Kress insist are the ultimate yardstick of successful schooling. They use benchmarks to determine who needs intensive remediation, because they want those kids to pass.

McKenzie and Kress and the rest of the Pearson fan club can’t have it both ways. If they are going to elevate standardized testing as the main source of worth for teachers, students, and administrators, then they are going to have to accept that standardized tests are going to become the main target of our labor. In order to give you the standardized test scores you insist make us worthy public servants—and in order to keep us employed and putting food on our tables—we will stop everything and pretest, retest, test, and posttest. And if our test scores are poor, we will pull kids out of class and give them more test prep. We will teach them test-taking strategies, and we will fill them full of free breakfasts and brain food prior to testing. We will do anything and everything within the law that our consciences will allow to attempt to get our students over this hurdle you have insisted we erect in front of them, so that they and we may be considered by you and your comrades in the judgment of educators as successful American people.

To their credit, McKenzie and Kress kindly acknowledge that “mistakes were made” in designing and implementing accountability systems. While they miss several dozen or hundred crucial mistakes that have harmed pedagogy and the climate in American schools, they do admit that school rating systems are convoluted and that data about the tests is often late-arriving. Sadly, they then veer into conspiracy territory by alleging that opposition to standardized testing is “orchestrated” and “well-financed.” They raise the specter of teacher’s union officials who have criticized the overemphasis on testing, as though that invalidates the opinions of thousands of parents, students, and teachers, and they strongly imply that opposition to over-testing and what it has done to our schools is simply a plot financed and “orchestrated” by unions who are unwilling to have their members held accountable for job performance.

Speaking of finances, the authors of this essay forgot to mention finances on the pro-standardized testing side, because there is some money there. They forgot, for example, the news story about Pearson using its non-profit foundation to send education commissioners on free vacations to exotic locales where they were allegedly met by Pearson corporate executives touting the benefits not of time-shares but of multi-million dollar testing contracts between those commissioners’ states and Pearson, LLC (10). Kress’s former client (Pearson, in case you forgot) enjoys a testing contract in Texas that at one time approached half a billion dollars for five-years of bubble tests. In fact, Texas has spent so much on tests from Pearson over the years that state officials can’t even say how much it has spent; it has lost count of the testing dollars (11). As for Pearson’s lucrative foundation work, after paying out $7.7 million to settle the case that alleged the charitable foundation “had engaged in activities to aid its for-profit business,” Pearson dissolved it (12). My point is that however well-financed the anti-standardized testing movement is or is not, the pro-standardized testing movement has some deep pockets of its own. Testing is supported by organizations like the Gates Foundation, the Walton Foundation, the US Chamber of Commerce, the Texas Association of Business, and many other corporate-funded entities.

Kress and McKenzie acknowledge the rollback of testing requirements occurring in different parts of the nation. In particular, they mention a state representative who lamented the changes by saying a common assessment holds “all the kids across the state to the same standard.” They didn’t mention any mechanism, however, for ensuring that all the kids across the state have access to the same standard of educational resources.

McKenzie and Kress begin to wrap up their essay by likening school accountability to the fight against heart disease. The difference, of course, is that treatment for heart disease is proven to help and not harm. Though the authors claim that the aim of school accountability is “to improve student achievement” there is ample suspicion among parents, students, and teachers that the greater purpose of school accountability is to harm the reputation of public schools in order to justify a complete privatization of the system, so that for-profit entities can gain access to the pool of tax dollars that have been constitutionally reserved for the adequate provision of a free public school system to the citizenry. The authors urge that we “can’t let the search for answers slow us down.” I don’t think that makes any sense at all; it seems to mean something like “we may not know where we’re going, but we need to hurry and get there.”

The authors end their essay as they started it, by badly abusing data from the NAEP exam. They claim that the data is “flattening out” and that NAEP gains have “tapered off” for certain age groups, and they blame this on the “retreat from accountability.” Oddly, the latest NAEP data available to these writers is from 2013, or two years ago, which would seem unlikely to have been affected by, for example, Texas’s bill which reduced in required testing and which McKenzie and Kress listed in their essay as a key example of the “retreat from high standards.” TEA announced initial assessment requirements under the new bill on June 12, 2013 (13). Even if that reduction in testing had the immediate negative impact the authors suggest, they would need to explain this language from the 2013 NAEP “Nation’s Report Card”: “Mathematics scores were higher in 2013 than in all previous assessment years at grades 4 and 8….Reading scores were higher in 2013 in comparison to all previous assessments at grade 8, and all but the 2011 assessment at grade 4” (14).

The insinuation that NAEP scores have begun “flattening” now that Americans are backing off of punitive test-based accountability just seems silly when considered next to this chart showing fourth-grade NAEP reading scores, which have been flat since 2002, one year after No Child Left Behind became the law of the land:



Near the end of their essay, McKenzie and Kress say something I can actually agree with. “To be sure,” they write, magnanimously, “there are ways to improve state accountability systems.” That may be the understatement of the year. Their prescriptions for improvement, however, sound a whole lot like a call for the Pearson-enriching, pedagogy-corrupting, test-hallowing status quo. To wit, they recommend:

  1. Annual statewide testing in reading and math. (Kress just dusted off his notes from 2001 for that one.)
  2. Don’t give benchmarks. (Just send those kids into the test that will determine if your schools stay open cold turkey, with no pretests to see if they’re on track. Also, have your football team stop playing scrimmages before district games begin.)
  3. Don’t just spank schools for bad test scores; bribe them too. (As long as everything good or bad that happens to a school is based on a test purchased from a major multinational corporation, we’re good.)

McKenzie and Kress have not realized even after writing ten pages of stuff that asserting student achievement gains doesn’t prove they have actually occurred. They seem to believe that by presenting only carefully-selected NAEP data that reinforces their preconceived notion that these policies have improved American education, they have somehow proven it. And so they conclude by saying that “we must preserve and build upon the principles that have led to gains in student achievement.” Like the impressive three-point gain on NAEP reading for fourth graders from 2002 to 2013? Then they warn that if we end accountability, we may “return to the world of leaving children behind.” You know, like the bad old days from 1973 to 1999, when the average NAEP reading scores for nine-year-olds jumped from 219 to 232.

And if we want to talk about closing the achievement gap, the argument that punitive test-based accountability is the best way to do that falls apart with a careful review of NAEP scores. The chart below shows the most dramatic gap-closing between African-American 17-year-olds and other races and ethnicities occurred not during the test-and-punish years following the No Child Left Behind Act of 2001, but rather between 1980 and 1988, when that subgroup’s average NAEP score skyrocketed by 30 points.













In conclusion, accountability based on standardized test scores is popular with corporations, CEOs, and white collar types, along with economists and people who want to end public education altogether. And Pearson. It is not, however, effective policy if your goal is to create schools marked by relevant, exciting, innovative learning and personal development, schools that prepare students for creative futures in industries that don’t yet exist. As Yong Zhao has noted, the American economy is built primarily on the willingness of Americans to take risks. Rows of children suffering through mass-produced standardized tests under the threat of government sanctions for insufficient conformity to ideals established by a cultural elite is a dystopian picture, but it’s real; it’s ripped from the pages of Aldous Huxley’s Brave New World, but it’s happening to your children and mine, and it promises to neuter America’s entrepreneurial spirit. We were not meant to fill in the bubbles; we were meant to blaze new trails.

Yes, there must be scrutiny of our schools, a meaningful system of accreditation, but the measuring must not overwhelm the doing, and the measurers must not exasperate the doers. Perhaps there is a fix for accountability. But if defenders of the testing status quo who are begging us to “mend, don’t end” accountability are to avoid being left behind, they must accept that the process starts by scrapping the whole thing and recreating it from the ground up, with leadership and legitimate input from the people inside the schools.



Note: This essay was written in response to a request for comment from Dr. Charles Foster Johnson of the advocacy organization Pastors for Texas Children.

John Kuhn is superintendent of schools in the Perrin-Whitt Consolidated Independent School District, in the state of Texas. You can follow John Kuhn on Twitter here.


Anthony Cody
Anthony Cody

Anthony Cody worked in the high poverty schools of Oakland, California, for 24 years, 18 of them as a middle school science teacher. He was one of the organizers of the Save Our Schools March in Washington, DC in 2011 and he is a founding member of The Network for Public Education. A graduate of UC Berkeley and San Jose State University, he now lives in Mendocino County, California.


  1. Paul Nehrig    

    Thank you for this insightful and succinct summation of the primary reason I decided, as a college student in 1989, to become an educator. My goal has been and continues to be to advocate for public ed from a position of experience and credibility as an educator first and foremost, against the forces of test, punish, & privatize. As a newly minted principal with 22 years’ experience in public education, I am gratified every time I come across the work of a like-minded defender of public schools. Your essay serves as a primer for anyone unclear about what the real issues are in the battle over our schools. Thank you.

Leave a Reply