The reason I post about cultural anthropology now and then isn’t that I want to argue or discuss with cultural anthropologists. Rather, I want to aid in spreading the message the discipline should be extirpated from the academy, just as Creationists have been extirpated from biology – Razib Khan
There is a long history of work claiming the mantle of science, which seeks to push forward essentialist theories of racial disposition and intelligence. Historically, racialist theories were formed upon a population typology which could be ranked along some set of criteria. Currently you can find modern armchair scientists hard at work behind their keyboards using programs like ADMIXTURE to form up new population typologies, which can be ranked along some set of criteria. See this nice article in the Annals of Human Genetics for an overview of the latter in terms of the former.
It isn’t hard to conflate population with race if you try, so I will let Khan explain how it is done:
The problem here is the word “race.” It has a whole lot of baggage. So many biologists prudently shift to “population” or “ethnic group.” I don’t much care either way. Let’s just put the semantic sugar to the side.
What Khan dismisses as so much “semantic sugar” is a notoriously arbitrary category, which varies widely across historical periods and cultural settings. For example, during the US census in 1790, a person could assume one of the following classifications:
1) free White men 16 and over
2) free White males under 16
3) free White females
4) all other free persons
5) slaves
By 1890, these classifications had changed to:
1) black
2) mulatto
3) quadroon
4) octoroon
5) Chinese
6) Japanese
7) Indians
But, why should Khan care either way?
Khan hangs his hat on the tight fit between computational tools, big data sets and a tiny bit of mangled theory he borrows from population genetics. The last few years have seen an explosion of both freely available genetic data and computational tools for statistically examining that data. Essentially, this is big data for genomic information. And it is a powerful and useful tool in the right hands. The skill, as in all research, lies in knowing where that point is and in having the discipline not to pass it.
But, as Nassim Taleb cogently points out:
big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is.
. . .
Another issue with big data is the distinction between real life and libraries. Because of excess data as compared to real signals, someone looking at history from the vantage point of a library will necessarily find many more spurious relationships than one who sees matters in the making; he will be duped by more epiphenomena.
My point here is that there is a difference of kind between the type of knowledge produced by “discovering” associations (note: not necessarily correlations) in big data sets and the type of knowledge produced in the field or laboratory. The shorthand for this difference has always been that correlation is not causation, but one should never forget the ramifications of mistaking the two can be stark.
This is related to Taleb’s other point, the difference between “matters in the making” and the library. Latour, in rephrasing Kaplan’s sentiment of 30 years prior, famously termed this disconnect the “Janus Face” of science. Going forward, either in the field or at the lab bench, science is an exercise in patience and frustration. You very quickly learn that nature is anything but uniform and smooth. As I mentioned in the first post, nature can be made uniform in a test tube and miracles can be performed, but only for short periods of time and at great effort.
However, for desk jockeys like Khan, who sit safely ensconced behind their keyboards where they face neither uncertainty nor doubt, the data they encounter has already been made uniform. Like all big data, processing genomic data for analysis requires taking a few analytic steps to cleanse the data prior to use. This paper gives a nice overview of the process and perils of cleaning data. But, just how often is the cleansing of data reported upon?
Back to Taleb:
And speaking of genetics, why haven’t we found much of significance in the dozen or so years since we’ve decoded the human genome?
Well, if I generate (by simulation) a set of 200 variables — completely random and totally unrelated to each other — with about 1,000 data points for each, then it would be near impossible not to find in it a certain number of “significant” correlations of sorts. But these correlations would be entirely spurious. And while there are techniques to control the cherry-picking (such as the Bonferroni adjustment), they don’t catch the culprits — much as regulation didn’t stop insiders from gaming the system. You can’t really police researchers, particularly when they are free agents toying with the large data available on the web.
As I mentioned earlier, there is a long history of armchair scientists like Razib Khan, Charles Murray, and Arthur Jensen attempting to extract answers from questions that population genetics cannot and will never be able to give meaningful answers to. It should come as no surprise that the answers they “discover”, as Taleb implies, never fail to reinforce their whiggish starting assumptions.
The question I am left with after this back and forth with Khan is: Why do the publishers of Discover (a magazine of science?) pay this guy to represent science to the the public?
A question for the publisher of Discover magazine. Do you consider this science?
Because of the occupational constraints of Ashkenazi Jews, and their narrow ecological niche as an non-agricultural minority, the development of a religious specialist class whose stock and trade was extensive commentary and interpretation of law is not entirely surprising. But it is also totally parasitic upon the genuine productivity of a society. The reality is that for a society to flourish you do not need thousands of ethical rules to follow. Like many investment bankers and “patent troll” attorneys the great rabbis of yore many have had fast processing units, but they did not utilize them toward productive ends.
Please note that the emphasis is Khan’s own.