By Anthony Cody.
When the history of modern education reform is written one of the most shameful chapters will be the continued embrace of various forms of “Value Added Models” for purposes of measuring the effectiveness of teachers in raising test scores. This month, the Department of Education is asking for comments on its intention to bring this pseudoscience to bear on the field of teacher education.
The proposal states that the new regulations will evaluate teacher education programs based on the following criteria:
Employment outcomes: New teacher placement and three-year retention rates in high-need schools and in all schools.
New teacher and employer feedback: Surveys on the effectiveness of preparation.
Student learning outcomes: Impact of new teachers as measured by student growth, teacher evaluation, or both.
Assurance of specialized accreditation or evidence that a program produces high-quality candidates.
This means that just as states have been required to develop systems that incorporate test scores into teacher evaluations, they will be required to likewise measure the “effectiveness” of teacher education programs based on the test scores of the teachers they produce.
I went to the link provided, and found a reference to a 2013 study by Goldhaber, Liddle and Theobald. The full study costs $19.95, so I did not purchase it. [Note: See update below] However, the summary of the study states this:
This paper presents the results of research investigating the relationship between teachers who graduate from different training programs and student achievement on state reading and math tests. Using a novel methodology that allows teacher training effects to decay, we find that training institution indicators explain a statistically significant portion of the variation in student achievement in reading, but not in math. (emphasis added)
How does this support the assertion in the graphic that the impact of learning gains in math was greater than the effect of poverty? What am I missing here?
The other sidebar graphic is this one, which cites a non-peer-reviewed report from TNTP, and makes the following assertion: “Teachers in the top 20% of performance generate 5-6 more months of student learning each year than low-performing teachers.”
Matthew DiCarlo at the Shanker blog took a look at the methods of this “study,” and found the following:
…most of their estimates (three out of four districts) are based on only one year of data, and while these scores are hardly useless, it’s not quite appropriate to draw any strong conclusions about teachers’ effectiveness with such small samples, and that includes grand labels such as “irreplaceable.” For instance, a decent-sized proportion of these teachers will not make the “irreplaceable” cut the following year, due mostly to error rather than “real” change in performance.
This brings us to one of the most egregious problems with VAM systems. Research shows that student test scores are affected by a large number of factors, and teachers only account for about ten percent of the variation. The lion’s share of effect are out of school factors related to poverty – which reformers have decided we are incapable of addressing in any meaningful way.
Given that students are not assigned or distributed to teachers randomly, any given year a teacher might get a “tough class,” and their scores might plummet. So teachers who are fortunate to be designated “irreplaceable” one year may find themselves on the hit list the next year.
This report from Linda Darling-Hammond, Audrey Amrein-Beardsley, Edward Haertel, and Jesse Rothstein found:
A study examining data from five school districts found, for example, that of teachers who scored in the bottom 20% of rankings in one year, only 20% to 30% had similar ratings the next year, while 25% to 45% of these teachers moved to the top part of the distribution, scoring well above average. (See Figure 1.) The same was true for those who scored at the top of the distribution in one year: A small minority stayed in the same rating band the following year, while most scores moved to other parts of the distribution.
The other problem is that certain sorts of students have a harder time improving their test scores than others. Research – detailed here, has shown that special education students and English learners are especially hard to move upwards.
Overall, the study found that, in this system:
Teachers of grades in which English language learners (ELLs) are transitioned into mainstreamed classrooms are the least likely to show “added value.”
Teachers of large numbers of special education students in mainstreamed classrooms are also found to have lower “value-added” scores, on average.
Teachers of gifted students show little value-added because their students are already near the top of the test score range.
Ratings change considerably when teachers change grade levels, often from “ineffective” to “effective” and vice versa.
We are already seeing the effects of using VAM scores to evaluate individual teachers. Teachers who are recognized as being resourceful and creative all of a sudden are thrown out for their low test scores.
What will happen when these systems are applied to teacher education?
First of all, programs will be obliged to recognize effective test preparation as a necessary element in their institutional survival. That is the clear intention of this “reform.” Programs that do not enthusiastically support implementation of the Common Core and other test-centered reforms are likely to lose access to funds and support – both state and federal. Programs which serve low income populations will be caught in a bind. While they may get some points for serving in high needs schools, the lower test scores these students tend to get, and the lower growth rates are likely to bring their “effectiveness ratings” downward.
A bit of history is in order. Where did this terrible idea come from?
There has been a longstanding frustration on the part of reformers with parts of the education system that are out of their direct control. Schools of Education are housed within universities, and are not directly funded by the Department of Education. Thus neither the bribes of Race to the Top billions nor the threat of NCLB waivers have been enough to completely overthrow many years of education scholarship. An organization called the National Council on Teacher Quality (NCTQ) was created to overcome this institutional resistance. Diane Ravitch was there at its birth, and has explained its genesis. According to Ravitch, NCTQ was created by the Thomas B. Fordham (TBF) Foundation because
We thought (schools of education) were too touchy-feely, too concerned about self-esteem and social justice and not concerned enough with basic skills and academics. In 1997, we had commissioned a Public Agenda study called “Different Drummers“; this study chided professors of education because they didn’t care much about discipline and safety and were more concerned with how children learn rather than what they learned. TBF established NCTQ as a new entity to promote alternative certification and to break the power of the hated ed schools.
Kate Walsh, the head of NCTQ, expressed this perspective when discussing a report the project issued in 2012:
A lot of schools of education continue to become quite oppositional to the notion of standardized tests, even though they have very much become a reality in K-12 schools. The ideological resistance is critical.
So NCTQ recommended what the Department of Education now plans to enact – that schools of education should be ranked according to their VAM scores, and be punished or rewarded according to these indicators of “effectiveness.”
This is the blunt force of federal dollars being used to overcome “ideological resistance” to the centrality of test scores.
Secretary Duncan occasionally makes speeches in which he says we should not teach to the test. He even wrote recently,
…testing should never be the main focus of our schools. Educators work all day to inspire, to intrigue, to know their students – not just in a few subjects, and not just in “academic” areas. There’s a whole world of skills that tests can never touch that are vital to students’ success. No test will ever measure what a student is, or can be. It’s simply one measure of one kind of progress. Yet in too many places, testing itself has become a distraction from the work it is meant to support.
We have heard this sort of talk before. But when this is followed by policies that reinforce the centrality of test scores as a means of measuring teacher effectiveness, these words are not just hollow, they are downright hypocritical.
There is a window of opportunity to comment on this proposal. Everyone associated with teacher education ought to comment. Professors – this is a perfect opportunity to acquaint your students with the policies that will impact their careers in the years to come. Student teachers, challenge your professors to take a stand. Comments should be sent to this address: OIRA_DOCKET@omb.eop.gov by January 2, 2015.
Update, Dec. 11, 2014: A reader provided me with a copy of the Goldhaber report cited in the graphic which claims that “the impact of teacher preparation in Math was considerably greater than the effect of poverty.” Here is the passage from that study that this assertion is based upon:
…there are a small number of programs that can be distinguished from teachers trained out-of-state, and the magnitudes of these differences are educationally meaningful. The point estimates, for example, suggest that the regression-adjusted difference between teachers who received a credential from a program with the lowest performing teachers and those who received a credential from the program with the highest performing teachers is about 12% of a standard deviation in math and 19% in reading. In math, this difference is 1.5 times larger than the regression-adjusted difference in performance between students eligible for free or reduced-price lunches and those who are not; in reading the difference is 2.3 times larger. So, while the bulk of our findings contribute to the growing literature demonstrating that observable teacher characteristics are only weakly correlated with teacher effectiveness, the striking differences in the effectiveness of teachers from programs at the tails of the distribution in Washington State hint at the potential of teacher training to influence student achievement.
This finding is then blown up and turned into the assertion that teacher preparation can overcome the effects of poverty! The authors of the study state in their summary that there is NO significant difference in math between different teacher training programs, but the Department of Ed has cherry-picked one small finding and blown it up out of all proportion. Is it ny wonder that people are losing trust in this “data-driven” operation?