Exploring causality (Part 3 of several)

By | July 27, 2013

The statistical tools we wave so proudly in the air were created and validated on measurements in the physical and biological sciences domains, where standardization of phenomena is both possible and universal and where the presence of a physical instance of something defines it and allows it to be pinned down and measured. But in the behavioral science domains, it’s often not so easy to determine what it is we are in fact analyzing, since phenomena are often defined only by their measurements. Intelligence is what is measured by intelligence tests; attitudes are what are measured by attitudes scales. Even behavior is hard to categorize; is that man over there a terrorist or a freedom fighter?  Or just a common criminal?  The same number of people may be dead in any event, but if we are to interpret the behavior, which is what behavioral science is all about, we need to be able to apply labels. And it is at this point of labeling that we have to face the question of just how we define the meaning of our data and how much confidence we have that it will stand up to the kind of assumptions required by the statistical tests we propose to apply.

The process of translating constructs into measurable variables is called “operational definition”, or “operationalization”.  Sometimes the correspondence between the theoretical constructs and the operational variables is quite close. For example, if you are looking at how humans gain weight as they get older, you probably will be using a construct called Age, and another called Weight (if you can manage to study this issue without Age and Weight, please let me know how). These can be operationalized (that is, turned into variables) called “AGE” and “WEIGHT” by recording the number of years that have passed since each person was born, and measuring his/her poundage on a scale. (These variables can be called “Ralph” and “Petunia”, for that matter; the names don’t have any relevance beyond helping you remember what they mean.) This gives you two numbers – things just like the physicists use.

Sometimes the correspondence between the phenomena and the numbers is more tenuous. For example, you might have a theory that the performance of a top management team is a function of the members’ ability to communicate with each other. Performance can be measured by profitability, among other things. But you may be unable or unwilling to try to directly measure this capacity for communication, or maybe they won’t let you ask the kind of sensitive questions that you know would be needed for a real measurement.  So instead, you measure the number of years that they have worked together, on the assumption that over time they learn to read and understand each other, and thus communicate better. Now you have turned your phenomena into numbers, and all the bounties of statistics become available to you. To you the theory, you correlate profitability with number of years a team has worked together, and maybe it turns out to be a nice large (i.e., publishable) value.  Of course, your published article will stress the connection you’ve found between performance and communication, not profitability and time.

The Operational Definition is the detailed description of how a concept or variable will be measured and how values will be assigned. Suppose that we’re studying criminal behavior. One operational definition of prior criminal behavior may use reported arrests for felony offenses based on an FBI fingerprint search, while another operational definition may involve self-reported criminal history obtained by response to a short list of questions on a standardized questionnaire.  Both can legitimately be argued to be appropriate, yet they will probably yield very different empirical results. In any case, what you are doing here is “mapping” your theoretical constructs onto some measurable real-world phenomenon. Here’s a visual that summarizes this process:


The situation becomes even more complicated when we introduce the use of multi-item scales or indices as composite measures of complex constructs. The more complicated the construct around which the proposed measure is organized, the less likely it is that any one single item can adequately represent it. And since most of the models behavioral scientists like feature fairly complex constructs, the use of scaling and other forms of composite variable analysis has become almost universal among researchers studying phenomena involving attitudes and behaviors. Scales are most typically constructed by taking a set of similar items with similar response categories, with each item presumed to represent some part of the total construct, and then either averaging or summing the individual values to produce a single composite value. When back in the 1940s Rensis Likert first introduced the composite scale that has come to be associated with his name, he could hardly have foreseen all of the variations on a theme that this has introduced – some legitimate and some suspect.

Like anything else in statistics, scaling has an underlying mathematical rationale based on a set of assumptions about the nature of the data — assumptions that are easy to overlook and are often dealt with rather lightly. The ubiquity of scaling in business and management research has both desensitized researchers to its finer points and at the same time rendered scales more suspect than the researchers would like. There’s certainly nothing wrong with scaling — it’s critical to constructing operational definitions that really mean something in many studies. But this makes it all the more important to do it right. Whether we adopt and use scales created and tested by others or engage in the even more daunting exercise of trying to create our own scales, we’d better be sure we know what we are doing.

Part 4 of this extended discussion on causal inference is here.