Do student evaluations measure teaching effectiveness? [Part 2]

By | March 30, 2014

This topic is too good to let go without a reprise; this post is based in part on a final post that I made to the LinkedIn discussion I mentioned earlier.

The point has been made repeatedly and correctly that feedback has to be multidimensional, reflecting different aspects of the teaching. Almost every survey tries to assess these different aspects. But the fewer occasions on which such feedback is obtained, the more likely it is that the individual measures will be so correlated as to amount to one “general satisfaction” factor that swamps any attempt to interpret them individually. This is almost guaranteed with a one-shot survey. So if you’re really interested in feedback about specific aspects, the measures have to be almost real-time.

I once experimented with an in-class real-time feedback system. I gave every student three colored cards. They were to display their green card if they were following the lecture/discussion, their yellow card if they were hesitant, and their red card if they were confused. After an initial learning period, the system worked surprisingly well for a time. Of course, the students soon learned that if they were going to display a red card they had better have a good question ready, since I would probably call on them. Eventually, this proved to be the downfall of the system. But it was proof of a sort that real-time feedback is possible.

A few professional conferences have experimented with using a real-time Twitter feed to provide feedback to speakers, with some success. One meeting even used a projection of the feed on a screen behind the speaker, so that the feedback was not visible to the speaker but it was to the audience. This was less than successful; it soon became more of a blood sport for the audience than an evaluation for the speaker. Processing real-time feedback is complicated. But it is certainly an ideal that we ought to consider in methodological terms.

Part of the problem is that providing feedback isn’t costless or time-neutral. It requires an investment of energy by the participants that takes away energy that would otherwise be devoted to the learning itself. If you’re tweeting about a presentation, you’re taking time out from your attention to the speaker to process your own words. Of course, this also applies to note-taking, or even reflective listening. The difference might be that when taking notes, you aren’t likely to be concerned too much with formulating your words as a bon mot that will impress others, as tweeters often do, so there is less energy diverted.

I once tried another experiment in which I handed out ahead of time a very thorough set of notes that I’d prepared on what I was going to say. I hoped that this might divert some of the attention that would normally be devoted to note-taking into actual listening. What I noticed almost immediately, of course, was that many students sat there carefully annotating my handout with notes of their own. Obviously, their notes represented for them a sort of value-added over and above what I was actually saying. It would have been interesting to have reviewed some of the extra notes that people were adding; that would have been good real-time feedback.

Pam Ey has asked an interesting question about the possible application of these methodological notes to workplace training, where the material covered may be less abstract and more practical. Speaking purely from a methodological viewpoint, it would depend a lot on the duration of the training. The basic problem is one of selective recall and what one might call “editing the past” – that is, the wholly normal reinterpretation of past events in light of subsequent experience. The shorter the interval between the event and its evaluation, the more accurate the recall of the event will be. So if you’re interested in whether things are being communicated on the spot, a short-interval evaluation might be useful.

However – and here’s the catch – that on-the-spot evaluation may not prove to be meaningful in the longer term. An end-of-class survey will yield measures averaged over the course, weighted more heavily by more recent events and by longer-term events that are linked to recent events. So if you are interested in take-away evaluation as of the end-of-class, a survey at that point may be useful. Perhaps more useful might be an evaluation six months or a year later, trying to assess what is recalled and what might have been proven useful. Of course, this is equally affected by the selective weighted recall and editing-the-past phenomena. But if interpreted carefully and allowing for these things, it could be useful.

It’s been suggested at times that we ought to apply the same procedure to academic feedback; that is, ask students at the end of their program to evaluate the retrospective utility of previous courses. This is more appealing than practical; the time factor and the variety of stimuli that would go into such comments would render them largely uninterpretable. I’ve tried to apply this idea in exit surveys, and gotten little but mush for data. In the workplace context, the accessibility of respondents and the narrower range of things about which recall is asked may make this approach more practical.

I’ve got one more post ready about this topic, but I’ll hold it for a day or two. Stay tuned.