Improving the Quality of Machine-Gradable Questions

Credit: iStock.com/Radila Radilova
Credit: iStock.com/Radila Radilova

This article is reprinted from The Best of the 2020 Teaching Professor Conference.


Tests provide one measure of our students’ learning according to the standards of the instructor and the field. But tests also affect our students socially, emotionally, and financially and influence their science-minded identities for years to come. We owe it to students to create fair tests with transparent expectations, clear guidelines for studying, and questions that adhere to scholarly practices. Our goal at the 2020 Teaching Professor Conference was to provide a road map to improving the quality of the machine gradable test items. We emphasized crafting and aligning learning objectives with test items and provided an item-writing checklist to help instructors critique and improve their test items.

Write clear, measurable learning objectives

Learning objectives connect teaching, learning, and assessment. They identify instructors’ expectations, help instructors align their tests, and help guide students learning.

Consider backward design

Backward design begins with a consideration of learning objectives—what should students know, understand, and be able to do at the completion of the course. With clear learning objectives, instructors can substantiate claims about students’ learning through assessments and plan instruction to help students reach mastery. Throughout the process, formative and summative assessment helps uncover gaps in student understanding and informs future instruction.

Figure 1. Process of backward design, adapted from Wiggins, G., & McTighe, J. (1998). Understanding by design. Association for Supervision and Curriculum Development

Components of an effective learning objective

Effective learning objectives are behavioral, measurable, and attainable. They reflect national standards and expectations in the field and ensure that learning objectives include both core concepts and competencies (Rodriguez & Albano, 2017). When crafting learning objectives, focus on the behavioral and measurable criteria, as it may be difficult to standardize attainable objectives.

The two major components of a well-articulated learning objective are the expected performance (cognitive task defined by an action verb that communicates what students are expected to do) and the content (what students will work with). When writing learning objectives, it is often easiest to think first of the content area and then consider what you expect students to be able to do with that content.

Components of a Learning Objective
Specify the performance—what students will do [action verb]
Draw a simple line diagram showing a segment of DNA from a gene and its RNA transcript, indicate which DNA strand is the template and the direction of transcription, and label (+/-) the polarities of all DNA and RNA strands.
Specify the content (what students will work with)

Bloom for performance

Consider Bloom’s level of thinking skills and associated verbs when writing the performance aspects of a learning objective (Figure 2). Remember that the verb doesn’t dictate the expected thinking skill, as this is determined by the action or performance that students demonstrate with the content.

Figure 2. Bloom’s level of thinking skills, Taxonomies of learning. (n.d.). Retrieved from https://bokcenter.harvard.edu/taxonomies-learning

Align assessment with the learning objectives

Be mindful that learning objectives guide student learning. Be specific and provide context so that students know how you will ask them to demonstrate their knowledge.

Figure 3a. Example of an item misaligned with a learning objective

Figure 3b. Example of an item aligned with a learning objective

Consider the two learning objectives in Figure 3 from a student’s perspective. Figure 3a mentions discussing cancer and tumor suppressor genes without much context. A student challenged with this assessment item might correctly think, “I didn’t know I was supposed to focus on one tumor suppressor gene, p53! This teacher is so tricky.” Small changes in the wording of the learning objective in Figure 3b can improve interpretation and thus effectiveness.

Critique and write effective test items

Test items should measure students’ mastery of objectives and inform future instruction. Well-written tests remove obstacles that confuse knowledgeable students as well as clues that support uniformed guessing (Albano et al., 2020).

Consider accessibility and inclusion

Measuring learning should be free from bias, and tests should be accessible so that all students can succeed. Several studies have documented performance biases in cognitive difficulty of multiple-choice items when comparing gender, socioeconomic status, or anxiety (Ballen et al., 2017; Wright et al., 2016). This is not to say that we shouldn’t be testing students with multiple choice questions at all or that we should be wary of adding high-level cognitive questions on exams but that we should be aware that some groups of students might need more exposure to practicing these types of skills.

Instructors should also remember that students differ in their past experiences in ways that might make test items difficult for them. Tests should not reinforce cultural stereotypes, and instructors should be cognizant of bias when writing test scenarios. They should construct tests that minimize excessive or unnecessary reading and be aware of some students’ physical disabilities that may hinder their success on the test. Finally, they might consider ways to alleviate the stress of stereotype threat when delivering test items (Steele, 1997).

Using selected response items

The focus of our workshop was on selected-response items that are machine gradable. These item types allow broad content coverage, are ubiquitous (especially in large classroom formats), and are easy to grade. Learning objectives that analyze conditions and phenomena, apply concepts and principles in new situations, and solve problems are measured very well using selected-response items. High-quality selected-response items are challenging and time-consuming to construct well. Farley (1989) estimated that experienced practitioners will take an hour to write one “good” multiple-choice question.

Types of selected response items

Some types of selected response items are more effective than others. Most instructors are familiar with conventional multiple choice. Forty years of research has shown that discrimination of student knowledge or understanding of the content (or both) is not significantly increased by having four to five distractors rather than only three distractors (Rodriguez, 2005). Problems have been identified with standard true-false items, and multiple true-false items are now recommended. Multiple true-false items allow instructors to set up a context or situation for students to evaluate discrete areas of content, allowing them to more specifically determine what students do or do not understand and better informing future instruction (Hubbard et al., 2017). Other recommended item types include alternate choice and matching items.

Item construction

In writing test items, it is important to consider content, question formatting, a well-written stem, and well-written answer options. Figure 4 outlines the general structure of selected-response items.

Figure 4. Selected-response item structure

Instructors are encouraged to use the item-writing checklist (Figure 5) to increase the quality of their test items. High-quality items yield more reliable information about student learning. This provides data that informs instruction and allows for continuous improvement and revision, bringing instructors full circle in the backward design process.

Figure 5. Item-writing checklist, reproduced with permission from Rodriguez, M., & Albano, A. (2017, p. 53)

Next steps

Improving assessment efforts should be an ongoing process for all instructors. The recently formed Advancing Assessment Skills in BIOlogy Network, or ASK BIO network, is designing and sponsoring faculty development workshops to support life science instructors as they learn to write machine-gradable assessment questions that align with learning outcomes inspired by the Vision and Change Report (AAAS, 2011) and present a high level of cognitive challenge. In addition, the ASK BIO network (Dr. Heather Seitz, Program Director, Johnson County Community College, hseitz@jccc.edu) will support continued interaction among workshop attendees who work in geographic proximity as they continue to improve the quality of their assessments.

References

American Association for the Advancement of Science. (2011). Vision and change in undergraduate biology education: A call to action. American Association for the Advancement of Science. https://live-visionandchange.pantheonsite.io/wp-content/uploads/2013/11/aaas-VISchange-web1113.pdf

Albano, A. D., Brickman, M., Csikari, M., Julian, D., Orr, R. B., & Rodriguez, M. C. (2020). Integrating testing and learning. HHMI BioInteractive.

Ballen, C. J., Salehi S., & Cotner S. (2017). Exams disadvantage women in introductory biology. PLoS ONE, 12(10). https://doi.org/10.1371/journal.pone.0186419

Farley, J. K. (1989). The multiple-choice test: Writing the questions. Nurse Educator, 14(6), 10–12. https://doi.org/10.1097/00006223-198911000-00003

Hubbard, J. K., Potts, M. A., & Couch, B. A. (2017). How question types reveal student thinking: An experimental comparison of multiple-true-false and free-response formats. CBE—Life Sciences Education, 16(2). https://doi.org/10.1187/cbe.16-12-0339

Rodriguez, M. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13. https://doi.org/10.1111/j.1745-3992.2005.00006.x

Rodriguez, M., & Albano, A. (2017). The college instructor’s guide to writing test items: Measuring student learning. Routledge.

Steele, C. M. (1997). A threat in the air. How stereotypes shape intellectual identity and performance. American Psychologist, 52(6), 613–629. https://doi.org/10.1037//0003-066x.52.6.613

Wright, C. D., Eddy, S. L., Wenderoth, M. P., Abshire, E., Blankenbiller, M., & Brownell, S. E. (2016). Cognitive difficulty and format of exams predicts gender and socioeconomic gaps in exam performance of students in introductory biology courses. CBE—Life Sciences Education, 15(2). https://doi.org/10.1187/cbe.15-12-0246


Rebecca B. Orr, PhD, is a professor of biology at Collin College in Plano, Texas, where she teaches introductory biology for science majors. Orr has a passion for investigating strategies that result in more effective learning and retention and is a certified Team-Based Learning Collaborative Trainer Consultant. She is a coauthor of Campbell Biology and Campbell Biology in Focus and a Howard Hughes Medical Institute (HHMI) BioInteractive Ambassador.

Peggy Brickman, PhD, is a Josiah Meigs Distinguished Teaching Professor at the University of Georgia who annually teaches introductory biology to approximately 600 undergraduates. Brickman conducts research on learning in the college STEM classroom and has developed several instruments to measure gains in scientific literacy skills and motivation to learn science. Her current research aims to examine the role of group interactions in promoting learning in large enrollment courses.

One Response

  1. I particularly appreciated your comments that, “They should construct tests that minimize excessive or unnecessary reading.” This is an issue whenever a new textbook is adopted. I noticed a test quality challenge recently when my department made the well-meaning leap to an open source textbook. While the new content was as good as the old, the old test bank did not match. Each author used a slightly different voice to describe even basic concepts. If multiple choice quiz questions don’t match in voice, wording, or definitions, this can unfairly confuse students and drive them to “Google” the answers. This is a serious problem with online teaching. Students who have done the required reading deserve quiz questions that test concepts using the same wording. In fairness to students, whenever a new textbook is adopted, a new test bank should be written to match.

Leave a Reply

Logged in as Julie Evener. Edit your profile. Log out? Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Love ’em or hate ’em, student evaluations of teaching (SETs) are here to stay. Parts <a href="https://www.teachingprofessor.com/free-article/its-time-to-discuss-student-evaluations-bias-with-our-students-seriously/" target="_blank"...

Since January, I have led multiple faculty development sessions on generative AI for faculty at my university. Attitudes...
Does your class end with a bang or a whimper? Many of us spend a lot of time crafting...

Faculty have recently been bombarded with a dizzying array of apps, platforms, and other widgets that...

The rapid rise of livestream content development and consumption has been nothing short of remarkable. According to Ceci...

Feedback on performance has proven to be one of the most important influences on learning, but students consistently...

wpChatIcon