Does PCA Have Politics?

This morning, armchair scientist and noted fan of this blog, Razib Khan, decided it would be prudent to write about race. It comes by way of Khan issuing a corrective, of sorts, to Ta-Nehisi Coates.

The Coates article is wonderful. He takes a historical look at how race has been deployed over the last 150 years. Along the way, he makes all the good points that can be made with the census, and some others as well. It is a nice reminder that far from being fixed, race is a potently flexible concept which can and has been use to classify (or cluster) humans based on any number of arbitrary factors. That is, he gives the classically anthropological argument that arbitrary classifications are taken up as naturalized in the support of explicitly political designs.

Which brings me around to Khan.

After first giving a brief history of the world as told through the clustering of genes in patterns – a la Cavalli-Sforza – Khan turns to the power of his beloved PCA:

When you take multiple dimensions and transpose the data geometrically you quickly see population structure fall out of the data set.

As if by magic, unsullied by the subjective whims of human judgment, PCA objectively does the work of racial classification. Khan eventually draws the following conclusion about race:

So there you have it. An underlying biological reality which is a reflection of deep history. It may not be real or factual, but it is consistent and coherent. Then there are innate faculties which lead us toward categorization of humans into various kinds, for deeply adaptive purposes. Finally, there are historically contingent events which warp our perception of categories so as to fit into power relations in a straightforward sense.

And here I agree with Khan. What he does is neither real nor factual, but it is consistent and internally coherent. For Khan, race is a biological reality, but historically contingent events conspire to warp our perceptions of this uncomfortable fact.

Steve Hsu, for his part, offers this muddled attempt to use race as a fixed concept without coming off as using race as a fixed concept. Needless to say, it doesn’t add up:

Now plot the genome of each human as a point on our lattice. Not surprisingly, there are readily identifiable clusters of points, corresponding to traditional continental ethnic groups: Europeans, Africans, Asians, Native Americans, etc. (See, for example, Risch et al., Am. J. Hum. Genet. 76:268–275, 2005.) Of course, we can get into endless arguments about how we define European or Asian, and of course there is substructure within the clusters, but it is rather obvious that there are identifiable groupings, and as the Risch study shows, they correspond very well to self-identified notions of race. ….

This leads us to two very different possibilities in human genetic variation:

Hypothesis 1: (the PC mantra) The only group differences that exist between the clusters (races) are innocuous and superficial, for example related to skin color, hair color, body type, etc.
Hypothesis 2: (the dangerous one) Group differences exist which might affect important (let us say, deep rather than superficial) and measurable characteristics, such as cognitive abilities, personality, athletic prowess, etc. …

The predominant view among social scientists is that H1 is obviously correct and H2 obviously false. However, this is mainly wishful thinking. Official statements by the American Sociological Association and the American Anthropological Association even endorse the view that race is not a valid biological concept, which is clearly incorrect.

As scientists, we don’t know whether H1 or H2 is correct, but given the revolution in biotechnology, we will eventually. Let me reiterate, before someone labels me a racist: we don’t know with high confidence whether H1 or H2 is correct.

Finally, it is important to note that group differences are statistical in nature and do not imply anything definitive about a particular individual. Rather than rely on the scientifically unsupported claim that we are all equal, it would be better to emphasize that we all have inalienable human rights regardless of our abilities or genetic makeup.

Hsu’s logic is wrong on several counts here. But, I will discuss the two points which are particularly glaring.

The first is simply the conflation of clustering imposed by PCA (which I will get to later) with the reified category race. He constantly confuses this issue. This is particularly evident when he chides the AAA for noting that race is not a valid biological concept and then points to evidence from PCA as evidence that race is a biological reality.

Second, his attempt to assert legal equality is belied by his other attempts to police access to public institutions based on IQ scores. Hsu’s conceptions of inalienable rights would appear to be taken directly from Plato’s Republic.   

Hsu is also wrong in implying that work on race within anthropology has been stagnate. In a recent (2009) paper titled “How Race Becomes Biology: Embodiment of Social Inequality,” Gravlee puts forth a powerful and subtle account of how social inequalities become reified under the rubric race.

Of interest in the recent back and forth on this blog is Gravlee’s argument about the abuse of PCA in genetics:

Yet some researchers still defend race as a useful
framework for describing human genetic variation—and
for identifying genetic influences on racial differences in
disease (Risch et al., 2002; Gonzalez Burchard et al.,
2003; Bamshad et al., 2004). The defense of race relies
on two related lines of evidence: 1) studies of worldwide
genetic variation show that individuals from the same
continent reliably cluster together (Rosenberg et al.,
2002; Bamshad et al., 2003; Shriver et al., 2004;
Rosenberg et al., 2005), and 2) in the United States,
‘‘self-identified race/ethnicity’’ is a useful proxy for
genetic differentiation between groups that vary in conti-
nental ancestry (Tang et al., 2005)…..

First, the claim that recent genetic studies ‘‘have recapitulated the classical definition of races’’(Risch et al., 2002, p 3) misrepresents the purpose of cluster analysis, which is to detect pattern in a given dataset, not determine the essential number of subdivisions in our species. An example of this error is the common interpretation of Rosenberg et al. (2002) as evidence that humans are divided into five genetic clusters (e.g., Bamshad et al., 2004; Mountain and Risch, 2004; Leroi, 2005; Tang et al., 2005). Evidence that humans can be divided into five clusters does not mean they are naturally divided, as the classical definition of race would suggest. In fact, the number of clusters necessary to describe global genetic variation has been inconsistent; some studies report five (Rosenberg et al., 2002) and others seven (Corander et al., 2004; Li et al., 2008). Even when the number of clusters is consistent, their boundaries and composition are not [compare Corander et al., (2004) and Li et al., (2008)], and finer substructures are obscured.


Gravlee goes on to offer three further points of rebuttal, all equally powerful. But, Gravlee’s argument about clustering points us towards another classic anthropological point; drawing boundaries, whether through language or mathematics, is political work. Further, what Gravlee argues about PCA holds true of all statistical techniques.

A cursory glance at the historic malleability of racial categories from any census, or a look at Ta-Nehisi Coates article will demonstrate this point. Race is undeniably a social category that carries real consequences for those caught on the wrong side of the classificatory scheme. How one chooses to classify is a political act and no amount of technical mediation can change that.


Gravlee, Clarence C. 2009 How Race Becomes Biology: Embodiment of Social Inequality. American Journal of Physical Anthropology 139(1): 47–57.

10 thoughts on “Does PCA Have Politics?

  1. if you don’t like dividing people up then how do you get the human family tree? you’re saying my dad may as well be more related to you than to me? not dividing people up into closely related groups would make all descriptions of humans meaningless.
    give an example of how you would map homo sapien ancestry and how it would branch off. i’m assuming you wouldn’t put everyone on the same branch, right?

  2. Te-Nehisi Coates also posted a similar (and more detailed) argument about the “biology and race” connection to the Atlantic. The relevant summary is here:

    …”Andrew [Sullivan] writes that liberals should stop saying ‘truly stupid things like race has no biological element.’ I agree. Race clearly has a biological element — because we have awarded it one.”

    So indeed race does have a biological factor–which in turn are given meaning in the social sphere. There are also a lot of other biological characteristics which are often not given explicit meaning–hair color, nose shape, left-handedness, height, all come to mind as biological categories which are not highlighted on the US Census form.

  3. My great-grandparents were born as literate farmers, and had lots of kids. One of them worked in a steel fabrication. Before that it was farmers all the way back in the US and Europe. Given that something like 90% of all the people in North America and Europe were farmers in 1800, this isn’t particularly surprising.

    Three of my four grandparents were born on rural areas, and two went onto graduate from college (unusual in the 1920s). One grandfather though was a longshoreman, the educated one sold cheese. Both of my parents had college degrees–they met while in college, which is a social strategy used by the middle classes to preserve endogamy, and social climb even today. Pretty typical I would guess for an upper middle class family in the US today. They passed on their social capital to me–but I really can’t figure out why the social capital they had would have come from their grandparents’ origins in rural America and Europe.

    Andrew Sullivan wrote the following about IQ testing and g scores, which I find convincing:

    For my part, I’ve come to doubt the existence of something called “g” or general intelligence, as the research has gathered over the years. I believe IQ is an artificial construct created to predict how well a random person is likely to do in an advanced post-industrial society.

  4. so it’s still ok to accept a hunter-gatherer society’s definition of “smart” (being a good hunter) but our isn’t valid? (not to mention a high IQ person would more quickly figure out how to hunt well than a low one and also that we are smart enough to have made hunting obsolete.)

  5. I guess it’s like Einstein said–it’s all relative.

    As for the hunter-gatherer analogy–I would make a lousy hunter no matter how smart I was since I am really near sighted. This is one reason I’m perfectly happy living in a world where abstract reasoning is also valued.

  6. So then me saying “it’s not relative” is also correct because it’s all relative! Yay!

  7. @dad. The nature of “cultural relativism” is an old philosophical conundrum discussed in almost every anthropology class, some would say ad nauseam. But the only thing worse than discussing the conundrum, is throwing up your hands and saying that the problem does not exist!

  8. @ dad
    @ Tony

    Working out your assumptions in public is a far more scientific way to operate than leaving them unexamined in the code base of your favorite software package.

Comments are closed.