Using End-of-Course Ratings to Improve Instruction

Credit: iStock.com/themacx
Credit: iStock.com/themacx

Editor’s note: The following article is part of a resource collection called It’s Worth Discussing, in which we feature research articles that are especially suitable for personal reflection and group discussion with your colleagues.

Why this article is worth discussing: For those interested in using course evaluation results to improve teaching, this article offers a set of evidence-based recommendations—clearly described and supported with multiple references. The review focuses exclusively on using end-of-course evaluation results for improvement purposes. It covers features of evaluations that generate good data, interpretation of the results, and development of action plans. It recognizes but does not consider evaluations’ use in the promotion and tenure process. By contrast, most reviews are more broadly based and not as pragmatic. The article is also worth discussing because research results indicate that end-of-course ratings tend to remain stable, meaning their regular use automatically improves teaching. End-of-course ratings can increase instructional effectiveness; this article proposes a logical, sensible way of achieving that goal.

The article

Boysen, G. A. (2016). Using student evaluation to improve teaching: Evidence-based recommendations. Scholarship of Teaching and Learning in Psychology, 2(4), 273–284. https://doi.org/10.1037/stl0000069

A synopsis

The four steps in this improvement process start with the instrument itself. First, it needs to be reliable and valid. In other words, it must measure what it’s supposed to measure. Second, the response rate needs to be adequate to ensure the data’s integrity. If there are 100 students in the course and only 10 complete the evaluation, that’s not a representative sample. That author discusses a variety of ways faculty can improve response rates. Third, good improvement decisions depend on a systematic analysis of the results: “In order for teachers to improve based on student evaluations, they must avoid haphazard interpretations based on simple heuristics” (p. 278). This need for careful review applies to quantitative as well as qualitative feedback. Finally, teachers need to set goals for improvement, and the evidence-based recommendation is to do that in consultation with a peer or an instructional expert.

During a discussion of or reflection on feedback from students, it’s important to note that the research on student evaluations is voluminous, with studies reporting a wide range of results. The literature can be cherry-picked to support any number of foregone conclusions. This review primarily relies on meta-analyses—those big reviews of research—that identify trends. It cites lots of individual studies as examples but does not make recommendations based on isolated explorations. It also cites examples that refute the trends.

Key quotations and discussion questions

1. Using reliable and valid instruments

Validity involves how the instrument defines good teaching and whether the dimensions of teaching that the individual items identify can be connected to learning. Reliability includes empirical issues related to interpreting the items on the instrument.

“Many colleges, rather than using standardized measures with known reliability and validity, create their own student evaluation measures by haphazardly selecting survey questions with face validity” [ones that “look like” they’ll measure, in this case, teaching effectiveness] (p. 275).

“Teachers seeking more trustworthy feedback can select a standardized survey to administer for professional development purposes” (p. 275). Note: three instruments are referenced and the actual instruments appear in the material referenced.

“Just as students need specific feedback on their performance in order to learn, teachers need specific, multidimensional feedback on their pedagogical skills if they seek to improve. Single items [“Overall, rate the quality of this instructor”] cannot provide such feedback” (p. 276).

“The perspective of students matters” (p. 274). This is how the author responds to arguments that students aren’t qualified to evaluate teaching or that their satisfaction with the course and instructor doesn’t matter. He also establishes the validity of ratings by listing six indicators of teaching quality that student evaluations predict. These include teachers’ self-evaluations, the ratings of trained observers, alumni ratings, student predictions of their own learning, objective measures of student achievement, and ratings of the same instructor in other courses (see p. 274).

  • Is the instrument your institution uses to evaluate instruction valid and reliable? Was it empirically developed and tested? If that’s unknown, it might be interesting to compare and contrast your instrument with one of the valid and reliable instruments referenced in the article (see p. 275).
  • If you were to consider the items on the instrument an operational definition of good teaching, what instructional behaviors would be part of how the instrument defines good teaching?
  • Has student evaluation data informed your efforts to improve? If it has, how regularly does it do so? Is there anything that prevents the data from being useful? What?

2. Getting a good response rate

Most institutions have moved to online evaluations, which have lowered response rates and raised concerns about who’s completing the evaluations and the fairness of their assessments.

“From a psychometric perspective, low response rates increase measurement error, which impedes the ability to make decisions from the data” (p. 277). Sampling theory proposes that a 3 percent margin of errors requires a 97 percent response in a class of 20, a 93 percent response rate in a class of 50, and an 87 percent response rate in a class of 100. A 10 percent margin of error for the same class sizes requires response rates of 58 percent, 35 percent, and 21 percent, respectively. There is not yet agreement as to an appropriate percentage of error for course evaluations. (See the discussion on p. 277.)

As for the reduced response rates with online evaluations,

“response rates have varied between studies, but it is safe to assume that at least 20% fewer students will complete online versus a face-to-face student evaluation survey” (p. 276).

“Online evaluations do not appear to be dominated by students who earned low grades and who, on average, tend to give lower evaluations of their teachers” (p. 276).

“There is no reason to settle for a low response rate because teachers have a wide variety of techniques at their disposal to increase participation” (p. 277). The author suggests that the prevalence of electronic devices makes it possible to complete online evaluations during class. He also recommends explanations as to why the feedback matters, repeated reminders, and incentives.

  • How much lower than in-person response rates are online ones in your courses and at your institution? Are these data being collected?
  • Why don’t students do course evaluations? Is there anything teachers can do about those reasons?
  • Do you think there are ethical issues involved in giving students “credit” (usually trivial amounts—1 percent or less of the course grade) for completing course evaluations? If so, what are the issues?
  • If completing the course evaluation “counts,” does that erode the quality of the feedback students provide?

3. Figuring out what the results mean

Interpreting course evaluation feedback isn’t always easy. Sometimes the results conflict. Sometimes the ratings change just a little bit. Occasionally, it doesn’t look like anything needs to improve. And every now and then a student offers a blistering critique of the course and instructor. There’s a need to look at rating data systematically and objectively.

“Student evaluation results represent scientific data, but the research suggests that faculty readily interpret that data without reference to established statistical principles” (p. 278).

“Because of the error that is inherent in any psychological measurement, student evaluations are not precise representations of teaching effectiveness” (p. 278).

“Just as researchers would never make conclusions about the results of a study based on raw means, teachers should not try to make pedagogical improvements based on unsystematic comparisons of raw student evaluation means” (p. 279).

“[Student] comments are presented as an unorganized mass in student evaluation reports, and this leads teachers to review and utilize them in a similarly unorganized way” (p. 279).

  • Is a lack of knowledge the problem with not applying statistics principles (standard deviations, confidence intervals, etc.), or are the emotional overtones of the whole evaluation process the issue?
  • Say there’s an indication in one set of student ratings that the exams aren’t covering what’s being focused on in class. Would you make changes according to that feedback or wait and see whether the same assessment is made next semester? Asked another way, how regularly must an issue be identified before you decide it needs to be addressed?
  • What are some useful, time-efficient ways to organize student comments?
  • Do you pay more attention to negative comments than to positive ones? Why?
  • Is it okay to ignore some student comments? Which ones?

4. Acting on the results

The information derived from course evaluations accomplishes nothing unless it’s acted on.

“Several longitudinal investigations have followed trends in student evaluations among the same group of teachers across multiple years, and the results indicate that evaluations remain stable despite the multiple rounds of feedback received by teachers” (p. 279).

“Meta-analysis indicates that teachers should discuss their evaluation results with a peer or instructional expert” (p. 280).

“Teachers should set goals for improvement” (p. 280).

“In general, improvement of teaching includes steps that are typical of all types of behavior modification—evaluating the current behavior, determining what needs to be altered, and acting on a specific plan for change” (p. 280).

  • How would you explain the stability in student ratings? What makes current evaluation processes ineffective for improvement purposes?
  • Do faculty at your institution share evaluation results and consult with appropriate others to determine what they mean and what action they might require?
  • How do you cultivate the objectivity needed to look at ratings and comments systematically and without getting emotionally sidetracked?
  • Can you improve teaching effectiveness without setting goals? Are there benefits associated with goal setting?

For further discussion

Golding, C., & Adam, L. (2016). Evaluate to improve: Useful approaches to student evaluation. Assessment & Evaluation in Higher Education, 41(1), 1–14. https://doi.org/10.1080/02602938.2014.976810

Hodges, L. C., & Stanton, K. (2007). Translating comments on student evaluations into the language of learning. Innovative Higher Education, 31, 279–286. https://doi.org/10.1007/s10755-006-9027-3


To sign up for weekly email updates from The Teaching Professor, visit this link.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Love ’em or hate ’em, student evaluations of teaching (SETs) are here to stay. Parts <a href="https://www.teachingprofessor.com/free-article/its-time-to-discuss-student-evaluations-bias-with-our-students-seriously/" target="_blank"...

Since January, I have led multiple faculty development sessions on generative AI for faculty at my university. Attitudes...
Does your class end with a bang or a whimper? Many of us spend a lot of time...

Faculty have recently been bombarded with a dizzying array of apps, platforms, and other widgets...

The rapid rise of livestream content development and consumption has been nothing short of remarkable. According to Ceci...

Feedback on performance has proven to be one of the most important influences on learning, but students consistently...

wpChatIcon

Editor’s note: The following article is part of a resource collection called It’s Worth Discussing, in which we feature research articles that are especially suitable for personal reflection and group discussion with your colleagues.

Why this article is worth discussing: For those interested in using course evaluation results to improve teaching, this article offers a set of evidence-based recommendations—clearly described and supported with multiple references. The review focuses exclusively on using end-of-course evaluation results for improvement purposes. It covers features of evaluations that generate good data, interpretation of the results, and development of action plans. It recognizes but does not consider evaluations’ use in the promotion and tenure process. By contrast, most reviews are more broadly based and not as pragmatic. The article is also worth discussing because research results indicate that end-of-course ratings tend to remain stable, meaning their regular use automatically improves teaching. End-of-course ratings can increase instructional effectiveness; this article proposes a logical, sensible way of achieving that goal.

The article

Boysen, G. A. (2016). Using student evaluation to improve teaching: Evidence-based recommendations. Scholarship of Teaching and Learning in Psychology, 2(4), 273–284. https://doi.org/10.1037/stl0000069

A synopsis

The four steps in this improvement process start with the instrument itself. First, it needs to be reliable and valid. In other words, it must measure what it’s supposed to measure. Second, the response rate needs to be adequate to ensure the data’s integrity. If there are 100 students in the course and only 10 complete the evaluation, that’s not a representative sample. That author discusses a variety of ways faculty can improve response rates. Third, good improvement decisions depend on a systematic analysis of the results: “In order for teachers to improve based on student evaluations, they must avoid haphazard interpretations based on simple heuristics” (p. 278). This need for careful review applies to quantitative as well as qualitative feedback. Finally, teachers need to set goals for improvement, and the evidence-based recommendation is to do that in consultation with a peer or an instructional expert.

During a discussion of or reflection on feedback from students, it’s important to note that the research on student evaluations is voluminous, with studies reporting a wide range of results. The literature can be cherry-picked to support any number of foregone conclusions. This review primarily relies on meta-analyses—those big reviews of research—that identify trends. It cites lots of individual studies as examples but does not make recommendations based on isolated explorations. It also cites examples that refute the trends.

Key quotations and discussion questions

1. Using reliable and valid instruments

Validity involves how the instrument defines good teaching and whether the dimensions of teaching that the individual items identify can be connected to learning. Reliability includes empirical issues related to interpreting the items on the instrument.

“Many colleges, rather than using standardized measures with known reliability and validity, create their own student evaluation measures by haphazardly selecting survey questions with face validity” [ones that “look like” they’ll measure, in this case, teaching effectiveness] (p. 275).

“Teachers seeking more trustworthy feedback can select a standardized survey to administer for professional development purposes” (p. 275). Note: three instruments are referenced and the actual instruments appear in the material referenced.

“Just as students need specific feedback on their performance in order to learn, teachers need specific, multidimensional feedback on their pedagogical skills if they seek to improve. Single items [“Overall, rate the quality of this instructor”] cannot provide such feedback” (p. 276).

“The perspective of students matters” (p. 274). This is how the author responds to arguments that students aren’t qualified to evaluate teaching or that their satisfaction with the course and instructor doesn’t matter. He also establishes the validity of ratings by listing six indicators of teaching quality that student evaluations predict. These include teachers’ self-evaluations, the ratings of trained observers, alumni ratings, student predictions of their own learning, objective measures of student achievement, and ratings of the same instructor in other courses (see p. 274).

2. Getting a good response rate

Most institutions have moved to online evaluations, which have lowered response rates and raised concerns about who’s completing the evaluations and the fairness of their assessments.

“From a psychometric perspective, low response rates increase measurement error, which impedes the ability to make decisions from the data” (p. 277). Sampling theory proposes that a 3 percent margin of errors requires a 97 percent response in a class of 20, a 93 percent response rate in a class of 50, and an 87 percent response rate in a class of 100. A 10 percent margin of error for the same class sizes requires response rates of 58 percent, 35 percent, and 21 percent, respectively. There is not yet agreement as to an appropriate percentage of error for course evaluations. (See the discussion on p. 277.)

As for the reduced response rates with online evaluations,

“response rates have varied between studies, but it is safe to assume that at least 20% fewer students will complete online versus a face-to-face student evaluation survey” (p. 276).

“Online evaluations do not appear to be dominated by students who earned low grades and who, on average, tend to give lower evaluations of their teachers” (p. 276).

“There is no reason to settle for a low response rate because teachers have a wide variety of techniques at their disposal to increase participation” (p. 277). The author suggests that the prevalence of electronic devices makes it possible to complete online evaluations during class. He also recommends explanations as to why the feedback matters, repeated reminders, and incentives.

3. Figuring out what the results mean

Interpreting course evaluation feedback isn’t always easy. Sometimes the results conflict. Sometimes the ratings change just a little bit. Occasionally, it doesn’t look like anything needs to improve. And every now and then a student offers a blistering critique of the course and instructor. There’s a need to look at rating data systematically and objectively.

“Student evaluation results represent scientific data, but the research suggests that faculty readily interpret that data without reference to established statistical principles” (p. 278).

“Because of the error that is inherent in any psychological measurement, student evaluations are not precise representations of teaching effectiveness” (p. 278).

“Just as researchers would never make conclusions about the results of a study based on raw means, teachers should not try to make pedagogical improvements based on unsystematic comparisons of raw student evaluation means” (p. 279).

“[Student] comments are presented as an unorganized mass in student evaluation reports, and this leads teachers to review and utilize them in a similarly unorganized way” (p. 279).

4. Acting on the results

The information derived from course evaluations accomplishes nothing unless it’s acted on.

“Several longitudinal investigations have followed trends in student evaluations among the same group of teachers across multiple years, and the results indicate that evaluations remain stable despite the multiple rounds of feedback received by teachers” (p. 279).

“Meta-analysis indicates that teachers should discuss their evaluation results with a peer or instructional expert” (p. 280).

“Teachers should set goals for improvement” (p. 280).

“In general, improvement of teaching includes steps that are typical of all types of behavior modification—evaluating the current behavior, determining what needs to be altered, and acting on a specific plan for change” (p. 280).

For further discussion

Golding, C., & Adam, L. (2016). Evaluate to improve: Useful approaches to student evaluation. Assessment & Evaluation in Higher Education, 41(1), 1–14. https://doi.org/10.1080/02602938.2014.976810

Hodges, L. C., & Stanton, K. (2007). Translating comments on student evaluations into the language of learning. Innovative Higher Education, 31, 279–286. https://doi.org/10.1007/s10755-006-9027-3


To sign up for weekly email updates from The Teaching Professor, visit this link.