The teaching of research and analytical procedures in the social and behavioral sciences generally is plagued with misunderstandings. In particular, we seem to have assigned quasi-magical qualities to numbers and the methods that we’ve learned to manipulate numbers – “physics envy” is alive and well on my side of the methodological house. But it does not provide a recipe for automatic truth generation.

Just because we can assign numbers to various levels of a phenomenon does not mean that the phenomenon is really quantitative or that we can interpret those numbers as containing the information that would be conveyed by actual numbers. It’s obviously an error to take, say, data that code a person’s hair color into a category (1=blond, 2=brunette, 3=redhead, etc.) and use it to compute “average hair color” (“1.8=dark with blond streaks”?). But of course a program like SPSS will cheerfully calculate it for you, without asking questions (and you might be amazed at the number of students who are perfectly comfortable presenting a finding like this in their homework.)

Less obvious are the errors perpetrated in Rensis Likert’s name by those who build “scales” with presumed numerical properties out of a series of numbers attached to the variables but lacking these properties. The idea is presumably that if you add up enough bad numbers, at some point they will be transformed into good numbers. It may well be true that if you have a whole series of variables with wild distributions and add them together to form a new variable, it will probably have a more “normal” distribution as the different variations from normality cancel each other out. But this doesn’t mean that the variable has somehow been magically transformed into something mathematically meaningful.

Statistics can be a very valuable tool for protecting the analyst against certain forms of inferential error, particularly the tendency to attribute causality where none exists. But will never create meaning in data where no meaning exists.

Recently, there was an online question posed about “exploratory factor analysis” (EFA). An analyst was wondering how to use this technique to make sense of answers to some 63 items that he had generated as proposed items for a scale to measure some concept. Clearly, he’d had some experience with this technique during a research course, but without a whole lot of context about it or how it is used. This is not uncommon in research training curricula that segregate “statistics” and “research methods” to separate courses, often taught by different people. My preference has always been to teach both subjects interlinked in the same courses; it’s not a perfect approach, but at least it does introduce statistical procedures in the context of the sorts of numbers and problems where they do work.

To begin with, a 63-item scale to measure any concept is ridiculous overkill. It simply indicates that the concept isn’t well understood, and the analyst is simply throwing items at it hoping that some will stick. This is a significant misuse of quantitative methods. The analyst is apparently under the illusion that EFA will provide some form of exact statistical solution to his problem. While he will certainly receive a slew of numbers upon feeding his data into something like SPSS, it’s unlikely that he will be much enlightened.

To begin with, if he has 63 items that are plausibly related to his concept, he is probably dealing with an idea sufficiently general to have meaningful sub-dimensions that might be separately explored. Clearly, that’s what he hopes that his EFA will resolve for him. But defining and interpreting sub-dimensions is much more theoretical and conceptual than it is mathematical. EFA may indeed identify possible groupings of your items into sub-scales, but depending on the options chosen, he’s likely to have several possible patterns to choose from – none uniquely correct. He’s likely to find himself playing the game of “Name That Factor”, trying to figure out what a specific set of items may have enough in common to form a basis for identifying them as measuring a meaningful sub-dimension. Sometimes this works; more often, it’s simply a recipe for frustration and muddy thinking.

One reason for this is that any EFA solution will be specific only to that data set. Principal components analysis – the most widely used algorithm – partitions the variance of the specific data set, providing a solution that works there but with no assurance that another data set collected on another group of participants wouldn’t generate a different solution. And with 63 items to start with, no matter what options are chosen, he’s pretty much guaranteed to find one or two large factors and then a string of maybe 10 or 12 small factors all accounting for relatively small amounts of variance. One can try to build theory on this foundation, but it’s pretty weak.

He would be better advised to consider his items carefully for a clearer conceptual definition of his variable (or variables). Upon examination, it may turn out that he really does have some well-specified sub-dimensions – some concepts work that way. To cite a well-traveled example, consider “worker satisfaction”. It’s pretty well established that this concept actually consists of a set of smaller constructs reflecting satisfaction with particular aspects of work – activity, conditions, co-workers, etc. Empirical exploration bears this out – but also notes that not only are these sub-dimensions not very strongly related to each other; they are also generally massively unstable in a test/retest sense. So their overall utility remains debatable.

It’s highly unlikely that any one data set would provide a theoretical breakthrough. If the theory isn’t well specified, then he’s just on a fishing expedition, and while he might hook something tasty, he’s much more likely to come up with some form of unidentifiable fish stew that he later winds up apologizing for in his “limitations” section. So while it’s not wholly wrong to try EFA, such techniques are no substitution for careful thought and appropriate theory. Structure is ultimately imposed by the analyst, in any event – hopefully in collaboration with the data. As always, treat your data as you would wish to be treated, if you were a datum^{©}.