Does PCA Have Politics? – Ethnography.com

This morning, armchair scientist and noted fan of this blog, Razib Khan, decided it would be prudent to write about race. It comes by way of Khan issuing a corrective, of sorts, to Ta-Nehisi Coates.

The Coates article is wonderful. He takes a historical look at how race has been deployed over the last 150 years. Along the way, he makes all the good points that can be made with the census, and some others as well. It is a nice reminder that far from being fixed, race is a potently flexible concept which can and has been use to classify (or cluster) humans based on any number of arbitrary factors. That is, he gives the classically anthropological argument that arbitrary classifications are taken up as naturalized in the support of explicitly political designs.

Which brings me around to Khan.

After first giving a brief history of the world as told through the clustering of genes in patterns – a la Cavalli-Sforza – Khan turns to the power of his beloved PCA:

When you take multiple dimensions and transpose the data geometrically you quickly see population structure fall out of the data set.

As if by magic, unsullied by the subjective whims of human judgment, PCA objectively does the work of racial classification. Khan eventually draws the following conclusion about race:

So there you have it. An underlying biological reality which is a reflection of deep history. It may not be real or factual, but it is consistent and coherent. Then there are innate faculties which lead us toward categorization of humans into various kinds, for deeply adaptive purposes. Finally, there are historically contingent events which warp our perception of categories so as to fit into power relations in a straightforward sense.

And here I agree with Khan. What he does is neither real nor factual, but it is consistent and internally coherent. For Khan, race is a biological reality, but historically contingent events conspire to warp our perceptions of this uncomfortable fact.

Steve Hsu, for his part, offers this muddled attempt to use race as a fixed concept without coming off as using race as a fixed concept. Needless to say, it doesn’t add up:

Now plot the genome of each human as a point on our lattice. Not surprisingly, there are readily identifiable clusters of points, corresponding to traditional continental ethnic groups: Europeans, Africans, Asians, Native Americans, etc. (See, for example, Risch et al., Am. J. Hum. Genet. 76:268–275, 2005.) Of course, we can get into endless arguments about how we define European or Asian, and of course there is substructure within the clusters, but it is rather obvious that there are identifiable groupings, and as the Risch study shows, they correspond very well to self-identified notions of race. ….

This leads us to two very different possibilities in human genetic variation:

Hypothesis 1: (the PC mantra) The only group differences that exist between the clusters (races) are innocuous and superficial, for example related to skin color, hair color, body type, etc.
Hypothesis 2: (the dangerous one) Group differences exist which might affect important (let us say, deep rather than superficial) and measurable characteristics, such as cognitive abilities, personality, athletic prowess, etc. …

The predominant view among social scientists is that H1 is obviously correct and H2 obviously false. However, this is mainly wishful thinking. Official statements by the American Sociological Association and the American Anthropological Association even endorse the view that race is not a valid biological concept, which is clearly incorrect.

As scientists, we don’t know whether H1 or H2 is correct, but given the revolution in biotechnology, we will eventually. Let me reiterate, before someone labels me a racist: we don’t know with high confidence whether H1 or H2 is correct.

Finally, it is important to note that group differences are statistical in nature and do not imply anything definitive about a particular individual. Rather than rely on the scientifically unsupported claim that we are all equal, it would be better to emphasize that we all have inalienable human rights regardless of our abilities or genetic makeup.

Hsu’s logic is wrong on several counts here. But, I will discuss the two points which are particularly glaring.

The first is simply the conflation of clustering imposed by PCA (which I will get to later) with the reified category race. He constantly confuses this issue. This is particularly evident when he chides the AAA for noting that race is not a valid biological concept and then points to evidence from PCA as evidence that race is a biological reality.

Second, his attempt to assert legal equality is belied by his other attempts to police access to public institutions based on IQ scores. Hsu’s conceptions of inalienable rights would appear to be taken directly from Plato’s Republic.

Hsu is also wrong in implying that work on race within anthropology has been stagnate. In a recent (2009) paper titled “How Race Becomes Biology: Embodiment of Social Inequality,” Gravlee puts forth a powerful and subtle account of how social inequalities become reified under the rubric race.

Of interest in the recent back and forth on this blog is Gravlee’s argument about the abuse of PCA in genetics:

Yet some researchers still defend race as a useful
framework for describing human genetic variation—and
for identifying genetic inﬂuences on racial differences in
disease (Risch et al., 2002; Gonzalez Burchard et al.,
2003; Bamshad et al., 2004). The defense of race relies
on two related lines of evidence: 1) studies of worldwide
genetic variation show that individuals from the same
continent reliably cluster together (Rosenberg et al.,
2002; Bamshad et al., 2003; Shriver et al., 2004;
Rosenberg et al., 2005), and 2) in the United States,
‘‘self-identiﬁed race/ethnicity’’ is a useful proxy for
genetic differentiation between groups that vary in conti-
nental ancestry (Tang et al., 2005)…..

First, the claim that recent genetic studies ‘‘have recapitulated the classical definition of races’’(Risch et al., 2002, p 3) misrepresents the purpose of cluster analysis, which is to detect pattern in a given dataset, not determine the essential number of subdivisions in our species. An example of this error is the common interpretation of Rosenberg et al. (2002) as evidence that humans are divided into five genetic clusters (e.g., Bamshad et al., 2004; Mountain and Risch, 2004; Leroi, 2005; Tang et al., 2005). Evidence that humans can be divided into five clusters does not mean they are naturally divided, as the classical definition of race would suggest. In fact, the number of clusters necessary to describe global genetic variation has been inconsistent; some studies report five (Rosenberg et al., 2002) and others seven (Corander et al., 2004; Li et al., 2008). Even when the number of clusters is consistent, their boundaries and composition are not [compare Corander et al., (2004) and Li et al., (2008)], and finer substructures are obscured.

Gravlee goes on to offer three further points of rebuttal, all equally powerful. But, Gravlee’s argument about clustering points us towards another classic anthropological point; drawing boundaries, whether through language or mathematics, is political work. Further, what Gravlee argues about PCA holds true of all statistical techniques.

A cursory glance at the historic malleability of racial categories from any census, or a look at Ta-Nehisi Coates article will demonstrate this point. Race is undeniably a social category that carries real consequences for those caught on the wrong side of the classificatory scheme. How one chooses to classify is a political act and no amount of technical mediation can change that.

Gravlee, Clarence C. 2009 How Race Becomes Biology: Embodiment of Social Inequality. American Journal of Physical Anthropology 139(1): 47–57.

Michael Scroggins