From the Print Archive

Four R’s of Effective Evaluation

Because so much depends upon the evaluation of a student’s learning and the resulting grade, it is in everyone’s interest to try to make the evaluation system as free from irrelevant errors as possible. Borrowing from the evaluation literature, I propose the four R’s of

Read More »

Twelve Ways of Looking at a Blackboard

In this high-tech era teachers often look askance at blackboards, most of which aren’t even black any more. Blackboards are something math instructors still scribble on, and are good for leaning against, although they dust your clothing and make you sneeze. And we all cringe

Read More »

The Lesson Is Too Much with Us: Recognizing Teaching Moments

In William Wordsworth’s well-known sonnet “The World Is Too Much with Us; Late and Soon,” the titular line’s meaning hinges on two words, the latter of which may initially seem insignificant: “world” and “with.” “World” refers to human affairs; and, of all the definitions for

Read More »

Witness the Struggle: The Gifts of Presence, Silence, and Choice

I have long pondered a phrase I learned from a mentor: “Witness the struggle.” Frances, my mentor, used the phrase when she talked about working with students in emotional pain. She was referring to those students who sometimes lash out in frustration over missed assignments,

Read More »

Finding the Discussion Question That Works

I’ve been teaching literature for more than 30 years, and nothing has struck me more during that time than the difficulty of finding just the right discussion question. It’s easy to give out information, which students dutifully take down in notebooks and throw away after

Read More »

Making the Most of 2,700 Minutes

Most faculty schedule at least three office hours per week—that’s 2,700 minutes a semester. If you have 135 students, that’s 20 minutes for each student. Even if you have 270, that’s still 10 minutes per student. Recently I’ve been working to make the most of

Read More »

Education: The Fury of a Storm or the Music of a Drizzle

Two readings triggered my thinking about contrasting images for education. In Charles Dickens’ Hard Times, Mr. Thomas Gradgrind tells us that education is stuffing facts into the minds of students. The more, the better. The quicker, the better. In current terms he favors “information bombardment.”

Read More »

Teaching Swimming or Coaching Swimmers?

A question has been floating around in my head since I started teaching college students: are we supposed to act like swimming instructors or Olympic coaches? The analogy is not as odd as it might seem at first. Don’t we talk about whether students “sink

Read More »

Appreciating Our Colleagues

During the pandemic the support we’ve received from and been able to offer to colleagues has offered a sliver of light during a season of darkness. We’ve had the joy of being there for each other. How could those of us teaching for the first

Read More »

Student Engagement: Trade-offs and Payoffs

I dread the moments when I look out into a classroom and see a collection of blank stares or thumbs clicking on tiny keypads: a pool of disengaged students, despite what I thought was a student-centered activity. Recently, I have been considering how teachers (me

Read More »
Archives

Get the Latest Updates

Subscribe To Our Weekly Newsletter

Magna Digital Library
wpChatIcon

This article previously appeared in the November 1993 issue of The Teaching Professor, where it was excerpted and reprinted with permission from The Center for Teaching Effectiveness Newsletter at the University of Texas at Austin.

Because so much depends upon the evaluation of a student’s learning and the resulting grade, it is in everyone’s interest to try to make the evaluation system as free from irrelevant errors as possible. Borrowing from the evaluation literature, I propose the four R’s of evaluation—Relevant, Reliable, Recognizable, Realistic—as ways to ensure the quality of our evaluation systems.

Relevant

In the jargon this is known as the validity of an evaluation method. This means that any activity used to evaluate a student’s learning must be an accurate reflection of the skill or concept which is being tested. What are the characteristics of a relevant evaluation?

Oddly enough, one characteristic that might seem very mundane is that the evaluation activity must appear related to the course content (known in the jargon as face validity). A common student complaint is that tests are not related to the course content or what was presented in class. Although we know that what we assign is directly related to the course, the students often don’t see the connection. And, student impressions aside, the more obvious the connection, the higher the probability that we really have a valid evaluation activity.

A second characteristic of relevant evaluations is that they are derived directly from the objectives (known in the jargon as content validity). The most obvious way to achieve this is to follow the objectives as closely as possible in selecting activities.

If your objective is that the students will be able to select the appropriate statistic for analyzing a given set of data, the evaluation should provide them with a data set and have them select the analysis. It could take many forms:

All of these alternatives represent relevant tests of that objective.

Another characteristic of a relevant evaluation is ho well performance on that evaluation predicts performance on other closely related skills, either at the same time (concurrent validity) or in the future (predictive validity). If the skill you are supposedly testing should be highly correlated with some other skill which you are also testing, chart the students’ performances on each and see if they follow the same pattern.

To use a simplified example, we can say that the ability to add two single-digit numbers is a precursor to, and therefore highly correlated with, the ability to add two two-digit numbers. Therefore, students who do poorly on the former should not be able to do well on the latter. If they do, then one of the two tests is not measuring what it is supposed to be measuring and is therefore not relevant to the additional skill we are trying to evaluate.

Reliable

The second aspect of an evaluation activity is how reliably or consistently it measures whatever it measures without being affected too much by the situation. A student’s grade should not hang on a single performance or on the mood of the person making the judgment. Of course, no system is perfectly reliable and will produce exactly the same evaluation of performance each time, but the goal here is to eliminate as many sources of error as possible.

The three biggest sources of error in reliably evaluating a student are:

Poor communication of expectations means that poor student performance may be the result of the student’s failure to correctly interpret the task requirements. In written exams this usually is caused by ambiguous questions, unclear instructions, corrections given verbally during the test, and so on. In each case, a bad grade is the result of the student not understanding the question. The student may in fact know the material.

Lack of consistent criteria for judgment means that, if the same performance were to be judged a second time by the same grader, or if another grader evaluated it, it might not receive the same grade because the basis for judging was unclear. The clearer the criterion for judging a student’s performance, the more reliable the evaluation becomes.

For example, one real strength of multiple-choice tests is that the grading is very reliable. Either the students marked the correct answer or they didn’t; very little is left to the judgment of the grader. On the other hand, essay tests are notoriously unreliable unless the instructor takes pains to make the criteria explicit and keeps checking to make sure he or she is not straying too far from the preset criteria.

Lack of sufficient information is the third source of error in evaluating students, not just in terms of the amount of information, but also in terms of variety of information sources. Not everyone excels in every format. Using only one format may introduce a source of bias for or against some students and lower the reliability of an evaluation.

Recognizable

Our third R is the need for the evaluation system to be recognizable to the students. By this we mean that students should be aware of how they will be evaluated and their class activities should prepare them for those evaluations. Testing should not be a game of “Guess what I’m going to ask you.”

Students don’t mind “hard” tests as long as there are no surprises and they can recognize the relationship of the test to the course. Some instructors may criticize this as “teaching the test,” but in reality the test should be the best statement of the course expectations and therefore should mirror the teaching. Furthermore, few courses are taught at such a low level that tests are verbatim transcripts of the class or text; rather they are interpretations or new examples of the class or text material.

Realistic

All of the above activities require work, on the part of either the students or the teacher. So, to avoid burning out either, the final R is that the evaluation system should be realistic: the amount of information obtained is balanced by the amount of work required. Too often we forget that our students are taking three to four other courses along with ours.

What is realistic? Unfortunately, no one can give a blanket answer to that question. I can say that several smaller assignments tend to be more valuable than one large assignment. Alternatively, if a large assignment is called for, spreading it out across the semester and requiring components to be handed in periodically is a good technique, both pedagogically and administratively.