Responsible data analysis as storytelling

By | March 12, 2015

Sometimes an online conversation can begin around one topic and segue into others, often considerably more profound than the original. Case in point: a recent LinkedIn discussion regarding how variables measured on different kinds of scales might be combined into overall indices most effectively. Backing up from this fairly specific issue raises some fundamental questions about how data analysis is conducted and presented. Since I have laid a certain claim to data analysis expertise, I thought it might be helpful to describe my overall framework for the process.

Essentially, I believe that social and organizational data acquire their value through their contribution to the construction of a story. The story is the original model for human learning, and is probably still the most effective means to encourage learning. Each of us has our own story, representing the cumulative meaning of our life as we understand it today. Stories are inherently dynamic, and change as new information is incorporated and new understandings developed. Systematic social and organizational research can be an effective way of understanding the story at a social and organizational level at a given point in time.
Different kinds of contexts develop different kinds of stories. In an article many years ago, I suggested that there were two widely differing models for organizational research that mapped well onto two rather different literary genres: the novel versus the soap opera. This was of course in the days when there were still soap operas; today, the equivalent would perhaps be the running podcast. The basic distinction is between research that attempts to sum up a situation in terms of distinct findings and research that reflects the underlying dynamic of continuous change. Larry Mohr once defined these two approaches as “variance models” versus “process models”. Whatever terminology one adopts, it is clear that these are two rather different kinds of inquiry.

The point here is not to emphasize the difference between these two approaches, but rather their commonality. In both cases, the aim of the research is to construct a story. One kind of story is not necessarily always better than the other; both can be instructive under the right circumstances. What is important is to remember that the outcome of research is to be a story, even if that story is incomplete or ambiguous.

Extending metaphors is always hazardous, but can be suggestive. Within this framework, the data can be understood as the characters in the story. Good narrative characters always represent something essentially human that is larger than themselves. Likewise, good data represent idealized or abstract components of the real world. Their meaning is generally acquired in context. Good data analysis essentially queries the data in a careful and systematic fashion that will allow the story to emerge. Treated properly and with respect, good data will tell you the operative story. Data can, of course, be essentially waterboarded into telling you what you want to hear, regardless of its truth, and far too much research follows this model. But responsible analysis listens to the data carefully. The individual voice of any one datum may be small, but cumulatively they can roar.

Statistical and numerical calculations are used by the analyst as a means toward developing the story; they do not constitute the story itself. Scales and indices are, as we noted, abstract representations of the fundamental ideas in the story. Just as it’s easier to describe a business tale by talking about how “the firm” does something rather than discuss the particular behavior of all the individuals who make up “the firm”, so it’s easier to describe the interactions of abstractions like “motivation” and “productivity” than try to explain how each individual variable or measurement is related to all the others.

Calculated scales and indices can be very useful to the analyst, as long as s/he understands their meaning, how they were assembled, and the underlying data from which they are derived. But research clients are unlikely to understand how the manipulation of numbers represents the manipulation of ideas. Too often (and I admit to having done this myself on occasion) we simply dump the data in some semi-digested form on the client’s table and leave it to them to figure out the story. This is not only inappropriate but fundamentally unhelpful. For those untrained in its manipulation, mathematics is as likely to obscure the overall story as to clarify it. Thus, whenever numbers and statistics are used the analyst must take further responsibility for interpreting them in terms of the underlying story.

I’ve been occasionally accused of being overly metaphysical in my approach to data and data analysis. I admit to personalization; my business card contains my slogan, “Treat your data as you would wish to be treated, if you were a datum.” Data are living things, since they are properties of living things, and therefore deserve the same ethical protections we extend to other living things. But the data aren’t the whole story, any more than history is the collection of all the behavior of individual humans. In both cases, the whole is more than the sum of its parts. Each level of analysis displays collective effects not immediately apparent from the effects at lower levels of aggregation. Ultimately, it is the responsibility of the analyst to assemble all these meanings into the overall story, in a form accessible to the client who sought the story originally. It’s a fundamental error to confuse the numbers with the story they tell.