Multiple Measures or Multiple Data Points?

Multiple Measures

Perhaps no profession is as endlessly fascinated with evaluation as teaching.  The concepts of transparency and accountability are woven into the very fabric of our work as educators in a way that is unique among professions.  On the one hand this is laudable.

On the other hand it leads to building of elaborate evaluation systems, systems that are costly, time consuming, and which are frequently criticized for efficacy.  Too often these systems became exercises in bureaucratic hoop jumping, disconnected from improvements in actual practice.

The trend during the Race to the Top/NCLB “flexibility” has been for states and localities to go down a rabbit hole of “multiple measures”, where a variety of components are added together, producing a number by which teachers can supposedly be compared, and which becomes the basis for various high stakes employment decisions, including hiring, firing, promotion, tenure and compensation.

Is this whole less than the sum of the parts?

In many places student test scores (including the dreaded value added or VAM approach) have become a large (or even largest) component of the evaluation score.  This has created (at least) two problems:

1) The majority of teachers teach in subjects without standardized tests. How do you capture a test score component for these folks?

2) The use and misuse of student testing has spiraled out of control.  Parents are starting to wake up to fact that their children are being tested not diagnostically and for their own benefit, but for the purpose of sorting and firing their teachers.

Because of the history and culture of our profession, we must be practical: teacher evaluation is not going away.  So how can we build an evaluation model that is time and cost effective, objective, and connects to improvements in professional practice?

Multiple data points.

In this approach, you put something at the center of the system.  In many cases this would be traditional administrator observation, but it could easily be a Tripod style student survey, or a National Board portfolio, for example.  Then you admit other data into the conversation for confirmation.

The variety of these data points and what they reveal is in a variety of books and research papers, including notably Everyone at the Table and Getting Teacher Evaluation Right.

We know that no one data point is a silver bullet that provides a complete, valid and reliable picture of professional practice.  Professional practice is a complex and sophisticated enterprise that must be viewed through a variety of lenses.  Observations, student achievement, surveys, artifacts, portfolios, etc, talk to each other in this scenario and become mutually reinforcing.

There is one other key piece – you need a research-based rubric, which everyone accepts and understands, to provide a basis for professional conversation, and a roadmap for improving practice.  In our district we recently agreed upon using Danielson’s Framework (Great Lakes TURN Regional Conference Nov. 3-4, 2011).

It is important to understand that a rubric is not in and of itself an evaluation system.  Rather, it provides the language to talk about practice, and you build the evaluation system around that language.

Within the rubric, “anchor components” are individual components in each of the four domains that drive the other components of that domain.  These anchor components are different for new and experienced teachers.   Examination of practice within the anchor component provides reasonable assurance that things are OK in the other components of that domain.

This simple idea has two important implications:  first, it provides a way to differentiate evaluation for the career stage of the educator by looking first at key areas of practice.  Second, it streamlines the process – by focusing an administrator’s attention, it reduces the data that needs to be looked at.  One need only look at the full spectrum of components in a domain if an issue is detected in the anchor component.

By using multiple data points, a research-based rubric and anchor components, it is possible to create teacher evaluation which is streamlined, accurate, and useful for planning professional growth.   If you can take some of the stress out the experience, educators will naturally embrace a good rubric and internalize it.  Why?  Because teachers spend a huge amount of time with their students, and if they are more successful in this endeavor their lives will be better in very concrete ways.  When educators take ownership of the profession, it reduces the need for elaborate teacher evaluation systems because the work is embedded in practice.  A virtuous cycle ensues.

Then the trick is how to connect this with professional development – but that’s subject for another blog!

What improvements in teacher evaluation would help you in your work?

Owens picSteve Owens is a National Board Certified music teacher who teaches P-6 general music, strings, band and chorus in Calais VT and Sharon VT.  He holds a second endorsement in technology integration, level 3 certification in Orff-Schulwerk (an approach to general music teaching) and has attended the Orff Insitut in Salzburg Austria. Steve participated in the VIVA NEA Idea Exchange on Time in School. 

 

How to Create Meaningful Assessments that Actually Inform Teacher Practice

By Elizabeth Tarbutton

Teacher evaluation is a hot topic in education these days, but has anyone stopped to ask what the purpose of it all is? I think most evaluators would say that the purpose is to grow better educators to create meaningful change in schools.  In order to affect these changes, evaluators collect a lot of data on students and teachers.  I would like to think that these data are commonly used to have a meaningful, actionable impact on student achievement. Unfortunately, many states, districts, and schools lack protocols as to how data should be used.  As a result, data is often misunderstood and used as an autopsy and not as a tool of improvement.  I served for three years as a data coach, while also taking on the responsibilities of classroom teaching.  I helped my peers figure out what data meant and how to use it to improve student achievement.  If meaningful data protocols were more widely employed, educators would be able to improve their instruction, and have a significant impact on student learning.

Subjective Data
Subjective data come in many forms during teacher evaluation: teacher observations; informal formative assessment; student surveys; school culture; etc.

In my experience these data are most useful when protocols for the generation and analyses of these data include the following elements:

  • The intent for subjective data collection is clear
  • The evidence collected has a purpose that ties back to the intent for data collection
  • Instruments used for data collection are intentional and thoughtful (i.e. use of technology enhances data collection as opposed to just be novel)
  • There is training and discussion as to what the evidence means for all players
  • Time is built in to reflect on data
  • Meaningful goals can created out of data
  • Action plans are created to enact goals
  • Action plans are reflected on and amended, as necessary 

Objective Data
Objective data most commonly come in the form of student assessment data.  As a data coach, the most overwhelming feedback I received was how meaningful and transformative it was for educators to finally understand what assessment data meant and how they could leverage that data to differentiate instruction in their classrooms.  The scary thing about this feedback is that, for years, educators administered assessments, but never understood or used the assessment results.  Empowering educators as to what data mean allows them to use assessments as a tool to improve the classroom experience and learning of their students.

In 2011 I received student data from the state test on two of my incoming students (we get state assessment data on our students after they start the new school year in a new class). Bryan’s score improved 650 points from the year before, while Austin’s score decreased by 95 points.  Bryan went from ‘low unsatisfactory’ to ‘low unsatisfactory’ (his score was significantly low the previous year), while Austin stayed ‘mid Advanced’.  According to the Colorado Growth Model, Bryan had inadequate growth, while Austin had adequate growth.  Perplexed, I looked into why this was the case and learned that the statistics applied to students in the Colorado “Growth” Model are ranking statistics: the model should truly be called the Colorado “Rank” Model.  This exemplifies that data analysis needs to be appropriate and meaningful.

After having successfully coached educators in interpreting and using data to inform their instruction, I have seen test scores increase by as much as 55% in one year.  What I have learned is that protocols need to be in place for creating assessments to generate meaningful data and to reflect on assessment data to inform instruction.  These are the key elements for successful data protocols.

Protocols for Creating Meaningful Assessments should include these elements:

  • Assessments should be designed to assess specific student learning
  • Evidence of student learning should be mutually determined when creating the assessment
  • Grading rubrics should be written so that student mastery is easily identifiable via key elements of performance
  • Rubrics should highlight key advances from one level of mastery to the next such that it is easy to identify methods of differentiation to promote student improvement
  • Assessment should be timely and administered in a way that educators and students can act on results
  • Assessment should take minimal time out of classroom instruction, and would ultimately enhance instruction

Protocols for Reflecting on Assessment Data should include these elements:

  • Educators and administrators should be trained as to what assessment data mean
  • Data should be analyzed/processed in a meaningful, appropriate manner
  • Educators should be given time to analyze assessment data using common procedures
  • Educators should be given time to collaboratively reflect on assessment data
  • Educators should be given time to plan a “response to data action plan” for their students
  • Students should be given ownership of their data by:
    • Including students in analyzing data
    • Students should be guided in creating, reflecting on, and amending goals as a result of their assessment data
    • Students should be aware of their resultant learning plan, and be given action items to enact their learning plan to reach their goals
    • Parents should be included in the data conversations
      • Parents should be informed as to what assessments their student is given and the purpose of that assessment
      • Parents should receive student data and be trained as to what their student’s data mean
      • Parents should be informed as to educational decisions being made regarding their student as a result of their assessment data

When all players are brought to the table, data is used to diagnose mechanisms to improve the student learning experience. When data is understandable and meaningful, the mounds of data collected during educator evaluation can drive meaningful change in the education profession.

 

tarbuttonElizabeth Tarbutton is a middle school math teacher at Hill Middle School in Denver, CO. She participated in the VIVA CEA Idea Exchange: Ensuring an Effective and Supportive Teacher Licensure and Renewal System in Colorado.

Driving Lessons: Putting the Data-Driven Map in Perspective

By Kathleen Sullivan

Data is defining the self worth of our children, the value of a dedicated, compassionate caring teacher, and the marketability of our homes. Data has proven to be invaluable as a tool to identify weak spots in curriculum and also as a way to identify students in need of academic intervention. But with the focus on data, something else happened. Education leaders, administrators, and teachers stopped talking about students as individuals; instead we began to hold data meetings and we started to refer to students simply as “above grade level”, “at grade level”, “progressing, but below grade level”, or “needs improvement”. At the same time, new students test scores began to be the first thing we checked to see how their scores would affect overall data for the upcoming testing season

Michelle Rhee of StudentsFirst, an aggressive education reform organization, appears to believe the only way to measure student and teacher success is through test scores. StudentsFirst recently released a report grading states on how they are working to elevate the teaching profession, empower parents, spend wisely, and govern well. Florida and Louisiana were at the top of the list. The problem is that the initiatives being promoted by StudentsFirst sounds great in theory but education reform goes well beyond test scores and data.

We need an education reality check. I recently “liked” a Facebook posting that read “I Care More About the Person My Students Become Than The Scores On The Tests They Take”. This doesn’t mean I don’t care about test scores and data. It does mean that society needs people who have integrity and character. Test scores are important as a way of measuring what students are learning. Does it measure smart? What does smart mean? Does it strictly mean a high test score? Personally, I think data and test scores are part of the puzzle. Students can explain a concept but often can’t write it. Students can demonstrate a concept by creating a project but they may not be able to read a word or understand a word on a standardized test and lose points.

We need to broaden the way we think about and use data so we can make sure we’re giving each student what they need to succeed. Some students need extra academic supports to increase their capacity to learn. Students with learning, physical, and emotional disorders also need special supports.

If we invest in supporting our children academically and emotionally, we will invest in children who can not only answer questions right but also can face challenges and seek solutions. Let’s figure out how to measure those skills too.

Kathleen Sullivan teaches 5th grade science at a public school in Malden, Massachusetts.

Why Education Reform Needs Data

Public education is a fragile yet critical resource and we have to do more to strengthen our public schools.  People are willing to acknowledge that too many students are in schools that don’t give them an adequate chance to learn.  Teachers know what it takes to be effective and administrators are working hard to get the necessary resources into their schools.

And yet. And yet.

There’s real concern about whether we can deliver to all students. I’ve been thinking a lot about this gulf between our effort and the nagging doubts about the ability to deliver success.

Why is it there? I think it’s because we’re afraid of the numbers. I can understand the inclination.  I was never big on numbers. In third grade, 8 X 2 frustrated and defeated me.

Nationally, numbers have gotten a bad name in education. Rather than seeing them as helpful, we see them as punitive.

Data, even standardized test data, is an important tool for teachers.But, the numbers hold the key to translating our aspiration for public schools into a success story. The numbers can tell us which concepts our students have mastered and which ones need more work. Teachers need to know that. Parents need to know that and, yes, the taxpayers who fund schools need to know that too.

So let’s have an honest conversation about assessments, tests and all. Rather than an all or nothing question of to test or not to test, let’s start talking about how various measures can be put together to give us a multi-faceted picture of the complex work of teaching and learning.  We need to be bold enough to be honest about what tests measure, what they can’t measure and what other data we can use to fill the gaps.

At VIVA Teachers, we know that teachers can drive this conversation. Put away the anti-testing rhetoric and the blame-the-teachers vitriol. And let the numbers help us find the right answer.

What’s a good example of how to use test data effectively?