Exploring Causality (Part 8)

By | August 23, 2013
generalizability

We ended the last part of this commentary by noting that many if not most behavioral science model-builders seem to be afraid of making predictions that might prove them wrong. The only analysts who actually revel in predictions seem to be the sabermetricians (sports statisticians), perhaps because they can demonstrate real value to their analyses (cf. Moneyball.) Among political analysts, for example, only Nate Silver shares a similar outlook, and he comes directly out of the sabermetric tradition. It’s interesting that he has just moved from the New York Times to ESPN; apparently accurate modeling doesn’t quite fit in with the Times‘ editorial tradition.

It might be helpful at this point to distinguish between two different meanings of the term “prediction”. On the one hand, there is the more common-sense usage of the term – that is, to describe in advance of some event what the shape of that event might be. This is the sense in which the term is used by sabermetricians and political analysts alike. This is prediction in the process sense – what is going to happen next? On the other hand, there is the structural or variance form of prediction, used by statisticians and model builders. Prediction in this sense is essentially equivalent to “amount of variance in some variable accounted for” by virtue of knowing the value of another variable. The independent or antecedent variable is termed the “predictor”; the dependent or consequent variable is termed the “criterion.” There is plenty of room for confusion between these two senses of the term, particularly when a political sociologist is describing a model in which several demographic characteristics of a population are used to “predict” voting results. Another analyst might observe somewhat acidly that it’s easy to make such a prediction, since the election has already passed; the prediction might have had more value had been made ahead of time. Both points of view are correct; they simply reflect differing uses of the term.

What’s the value of “predicting” something that has already happened? Well, the model builder might well respond that by using data from previous election cycles, s/he is defining a set of relationships and coefficients of effects among variables that can in turn be applied to prediction in the process sense during the next election cycle. In fact, most political as well as sports modeling is based on just this approach. The whole field of “data mining” is based on an extension of this approach, in which very large sets of data are divided into two parts, with one part used to estimate the model and the second part subsequently used to test or confirm the model. Political analysts divide their data by time period (last election/this election); data miners simply divide the data into two parts randomly. In both cases, however, one is essentially using all of the data, and any inferential statistical procedures such as regression used in these analyses need to be reinterpreted in light of this.

By contrast, most of the modeling efforts conducted by psychologists and organizational analysts in particular are based not on population data but on data from samples. Earlier in this series, we discussed sampling and the various varieties of samples that can be drawn. We also noted that the kind of sampling done puts various kinds of limitations on the degree of generalizability that we can expect for our analytical findings. The closer our sample approximates a true random sample from a defined population, the more we expect to be able to generalize our findings to that population. If we have various degrees of compromise to the randomness of the sample, our generalizability is accordingly limited. That does not stop many analysts from pushing the boundaries of generalizability just a little farther than they really ought to be extended. After all, arguments for generalizability apart from that provided by the true random sample are essentially arguments based on credibility – that is, does it make sense that this relationship would hold in the population?

The unpleasant little secret that we try to keep hidden is that we frequently let our colleagues get away with pushing the bounds of generalizability for a model, for two reasons – neither of them particularly admirable. First, we expect them in turn to let us get away with pushing the bounds of generalizability for our models; and second, we all know that it really doesn’t make much difference one way or another. Taken together, these two reasons suggest that a good deal of what passes for analysis and model building in our field is essentially artificial, held together by a kind of conspiracy among practitioners of the art not to look too deeply into the results but to take them essentially at face value.

This is a rather discouraging point at which to break this discussion. However, it appears that we’re going to need at least one more part to wrap up this conclusion, and let’s hope it takes a more positive turn.