Discrimination, representation, and ecological fallacies

By | February 6, 2014

​Back in November 2013, Alice Marwick published an article in Wired Online entitled “Silicon Valley Isn’t a Meritocracy. And It’s Dangerous to Hero-Worship Entrepreneurs”. In it, she basically claimed that women and minorities were being systematically cut out of the Silicon Valley elite by assorted Powers That Be. It’s an interesting article, and as might be suspected, it provoked a lot of online debate pro and con. Excluding the expected amount of “’Discrimination!’ “’Tain’t so!’” hoo-haw, there has been some serious discussion of whether the thesis is true or not. The major argument for its truth seems to be the obvious fact that there aren’t a lot of women or non-Asian minorities among the group generally identifiable as the SV elite.

I had to weigh in here, because as you my Loyal Readers know, I care a lot about the use and misuse of data in social policy making. The remainder of this post is a slightly reworked version of a comment I posted to that group regarding the evidence adduced to support the discrimination argument; I thought that its points about “ecological fallacies” were important enough to warrant its (properly attributed) recycling. The presenting issue was whether the SV elite constituted a “sample of America”. Here’s how I responded:

The whole issue of “sampling” is a red herring of sorts. Sampling as such matters only in statistics, where the aim is to make inferences from a small group to a population from which the group was selected. In theory, the strongest sample is one selected randomly from the population; the appropriate size of the sample is determined through the process of “power analysis” in which a number of factors are traded off, including size, the kind of statistical test to be made, and the levels of both Type I and Type II errors deemed to be acceptable. In practice, most samples drawn from large populations are “weighted” in several ways, with some sub-groups over-represented in terms of percentages and others under-represented (e.g., “likely voters” are generally over-represented in political polls). Using appropriate weights allows relatively small samples to be extremely effective in terms of reflecting population behavior (e.g., the best political polls can routinely predict the results of national elections from samples of 1500-2000, if it’s close to the election).

Most of the stupidity evident in political polling revolves around the weights to be used. It’s pretty easy to conduct a poll that will predict that your guy is going to win by a landslide if you weight the opinions of members of your own political party a lot more than those of your opponents. The danger is then that you may begin to believe your own fraudulent figures; it seldom works out well when that happens, as the Republicans found out in 2012. Honest polling, as opposed to polling as “strategic symbolic politics”, is pretty scientific. Facts are, as they say, stubborn things.

All this is completely apart from the use of “sample” in a descriptive sense, as a representation of the population at large. Obviously, Silicon Valley in particular and Santa Clara County and indeed the SF Bay region in general are not “typical” of the country or the world in terms of ethnic and most demographic characteristics (with the exception of the distribution of gender, which is pretty uniform across the world except where it’s been deliberately tampered with by law or custom (e.g., China and India). Expecting that any given sub-group of a population will more or less exactly mirror the distribution of any or all population characteristics is an example of what’s called in statistics an “ecological fallacy” (e.g., “Since the average male can bench-press 200 lbs and the average female can only bench-press 100 lbs, any man is stronger than any woman – so we’ll exclude Jane from consideration to be a firefighter, despite the fact that she can bench-press 400 lbs.”) Obviously stupid when put like that, but all too frighteningly real in many social and organizational settings.

The point is simply that “sampling” as a statistical issue or even as a descriptive issue really has very little to do with the basic issue in the article, which is the undisputed point that there aren’t a lot of women or minority group members (with the obvious exceptions of Asians) among the elite of Silicon Valley. The real question is why this is so. The answer would matter only if there were some reason to suspect that socio-economic forces are being deliberately or even conspiratorially rigged to create and maintain it, and that as a consequence a lot of women and non-Asian minorities with highly meritorious ideas (or at least as meritorious as those possessed by those who do become part of the elite) are being excluded from the opportunities to bring their ideas to market.

There is no clear overall evidence that this is the case; simply pointing out the “under-representation” of certain demographic groups among the Silicon Valley elite is not evidence thereof, although there are undoubtedly examples where demographics had negative consequences for specific individuals. Bankers and venture capitalists are as prone to committing ecological fallacies as the next guys, and they may well believe that since, say, black men aren’t a large subset of the SV elite, that the next black man who comes through the door doesn’t have a marketable idea on his tablet. There are research studies that could be designed to determine just how much of this goes on, and what the costs might be, but to my knowledge, they have never been done. If markets were truly efficient, then this behavior ought to have been gradually discouraged as firms that didn’t succumb to the fallacy and truly evaluated the black man’s ideas on their merit benefited from his accomplishments. But as other research has shown, markets aren’t in fact very efficient at all in this respect, and any market forces toward rationality can very easily be swamped by socio-economic stupidity without anyone really noticing. So it remains an open question as to how much of the “under-representation” is really due to such socio-economic stupidity and how much, if any, is due to the under-represented not actually having meritorious ideas that ought to be funded.

In short – the issue might be important, but this article doesn’t clearly establish this. There are ways that the issue could be clarified, but there doesn’t seem to be much interest in them. It’s a lot easier for both sides to huff and puff and play for cheap political points. Give us all a break, and either answer the question or don’t.

Well, that’s an argument that’s intended to perhaps spare some more innocent data from waterboarding. But what do you think?