Respecting the data

By | January 20, 2013

ozIn today’s posting on The Conversation, a very interesting Australian blog discussing current issues, there is a very interesting article by Michael Brown, entitled “Faking waves: how the NRA and pro-gun Americans abuse Australian crime stats.” Essentially, his point is that statistics regarding the Australian experience with gun control, implemented originally in the late 1990’s in response to a couple of tragic mass shooting episodes and strengthened subsequently, is being systematically misrepresented by certain groups in the current American discussion regarding possible firearms restrictions. Specifically, he makes the point that many of the current anti-gun-control groups are using selected small pieces of the Australian experience to suggest that gun-control there has resulted in increases in crime. In fact, legitimate analysis of the data from Australia strongly suggest that there gun-control efforts have substantially reduced firearms related homicides and particularly firearms related suicides. In addition, there have been no mass shootings in Australia since the 1996 Port Arthur massacre that provoked the implementation of firearms restrictions.

Brown suggests that NRA and its allies have in some cases cherry-picked the data by selecting small subsamples and small time periods rather than reflecting the overall data. In particular, their ads suggesting that crime has increased in Australia since the implementation of gun-control are based on such small samples rather than the overall experience across the country. In fact, around Australia, robberies using firearms have declined from over 1500 per year in the 1990s to 1100 per year recently. Some of the recent NRA ads have simply made up numbers, on the assumption that nobody will ever really check them, or have tried to suggest that Australian gun control has led to increases in other crimes such as sexual assaults, which is simply nonsense. The point overall is that there is a real Australian experience with major-league gun-control that has had a variety of consequences that can be reflected in data. A responsible analysis of these data will suggest that overall, gun-control in Australia has substantially reduced the damage done to society by firearms, without substantially substituting other kinds of damage.

While this experience may not necessarily be directly transferable as a policy conclusion to the United States, it is important that at least the Australian data be respected. Like all good data, they tell a story – and it is critically important that the story be told correctly. Picking and choosing small pieces of data that run counter to the overall story, or even worse, simply making up lying numbers to convey a contrary story, conveys a contempt for the data and, by extension, contempt for the people’s experience that the data summarize. As I have observed on a variety of other occasions, and will continue to preach in subsequent posts, the basic rule of data is “treat your data as you would wish to be treated, if you were a datum.” Data are in fact living things, insofar as they represent the behavior of other living things, and deserve to be treated with the same degree of respect accorded to living things generally. Mis-analysis and misrepresentation of data ought to be considered as crimes against living things, not just fiddling with numbers.

The recent American election has shown very clearly that when data tell a story, that story will be revealed as true, regardless of efforts to twist the story by disrespecting the data. The key to good election prediction, such as that made by Nate Silver for The New York Times, is the development of effective models drawing together widely disparate pieces of data. Since data want to tell stories, one test of the quality of data that they is able to tell a consistent story.  In this context, a model can be seen as a sort of meta-story – a story about the stories told by the elements going into it. When the stories are coherent, then the model can become an effective explanatory commentary on the narrative overall.

All Silver really did was to assume that the best possible predictor of how people were going to vote was how they said they were going to vote when asked in an honest way. Opinion sampling is really a pretty precise exercise if you really want good answers; of course, it’s also quite possible to work it the other way and gather data that will support any answer you want. And as this cycle has proved, there’s no shortage of analysts ready to do just that, although I suspect that they may be in somewhat less demand in the future now that it’s abundantly clear that it just doesn’t help to gather bad data. Since  data are living things, they will tell you the real story if you ask right. But you have to respect the data, and that in turn comes from a fundamental respect for your respondents. The approach taken by other political modelers, most spectacularly the Colorado University professors, was a more old-school approach in which you assume that your respondents aren’t really cognitive agents but merely pieces pushed around on your playing field by anonymous economic and social forces. I believe that the superior performance of Silver’s and related models is based in their greater respect for their respondents and thus for their data.

The burden of my argument here is that treating your data respectfully and honestly, and seeking for the overall story that they tell, rather than coercing small pieces of the data to tell some other story, is the best approach, and will in the long run reward the analyst more effectively. Coercing data into misrepresenting reality is eminently possible, particularly since most readers, even those fairly well-educated, are for the most part not schooled in data analysis and are thus prone to accept at face value what someone waving numbers around is saying. But reality cannot be changed by simply asserting that is changed. No amount of data misrepresentation will change the fundamental reality that heavy-duty gun-control in Australia has substantially reduced firearms related deaths without substantially increasing crime. That is the overall story told by the data, and deserves to be respected as a living conclusion. Data misrepresentation has not yet been identified as a major crime against humanity, but we ought to move in that direction.

I will have more to say in subsequent postings about this question of data as living things; stay tuned.