Depictions of Human Genetic Relationships: Looking at Li et al’s colored belt diagram
April 21, 2011 2 Comments
Following up on a long series of posts exploring depictions of human genetic relationships, let me consider the colored belt diagram and 2-dimensional PCA (principal component analysis) of Li et al. (2008), http://www.sciencemag.org/content/319/5866/1100.full
The 7 colors seem to suggest that there are 7 races that correspond reasonably to classical views of races. However, the color coding comes from a genealogical tree on which the colors are superimposed and are given by reference to classical or geographic views of human groups (see below). (That is, you get out what you put in.)
If, instead, we divided human genetic variation according to branching distance from the external reference population (namely, chimpanzees) and asked for 7 groups, they would be San, Mbuti Pygmies, Biaka Pygmies, Bantu, Mandenka, Yoruba, and the rest of the human population in the world all put together. (Is this right? The assumption here is that time since diverging from a common ancestor is, in molecular clock fashion, reflected in overall genetic divergence.) That would give us a colored belt with 6 narrow bands of color on the left and one solid band stretching out the rest of the way to the right with a few stripelets from the six African groups appearing in that solid band.
Now someone could sample more in the African groups and produce a tree that, say, splits the San the Mbuti Pygmies, and the Biaka Pygmies and then, to make the 7th group, lumps the the rest of the human population in the world all put together. Or someone could sample fewer African groups and end up with, say, three African groups, Mozabite, Bedouin, Palestinians, and the rest of the human population in the world all put together in the 7 human groups.
This sensitivity to sampling leads to the question: Is there a basis for subdivision that isn’t susceptible to who is sampled more or less and doesn’t depend on color coding that mostly defines the groups before doing the analysis? Perhaps the PCA in figure S3B is what’s needed—it uses the full set of information about genetic variation in a sample of 900+ individuals to spread the sampled populations across two dimensions.
That diagram suggests (to my eye) 4 subdivisions of human genetic diversity — African (red), Europe + Middle East + Central/South Asia (all in one group, brown, green, and light blue), E Asia + America + some CS Asia (gold, purple), Oceania (deep blue). Does that bring us back to Africans as one race, albeit with Europeans subsumed in a large mix? That is, does it undermine a reading of the Campbell-Tishkoff diagram that suggests that, if there were 4 races, 3 would be African and one would be African plus the rest of the world put together?
At the same time, keep in mind is that, in Li et al’s analysis, 89% of variation is within populations, 2% is among populations, within groups, and 9% is among groups. (This affirms results dating back to Lewontin 1972, but affirmed by subsequent work, such as Hofer et al. 2008, that, on average, for any genetic locus roughly 5/6 of the variation is within a population, 1/12 is within a region, and only 1/12 occurs among regions.) What such variation means is that it is difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch or the other. We need to say difficult, not impossible; merely subject to more errors than to correct assignments.
The question I have (and plan to email the authors to see if they can help) is why this isn’t evident in the PCA plots. Those plots make it look like within groups variation is somewhat less than among groups variation. Perhaps this is because the PCA uses many loci and, contra what I just said above, a combination of a random selection of loci does allow us to discriminate among something like the classical groups? (Remember, however, that people of recent mixed ancestry tend to be eliminated as subjects in these studies.) Perhaps it is because the PCA is biased towards the particular loci that allow us to trace ancestry. Obviously, I need to understand more about what the methods are doing.
Campbell, M.C. and S. A. Tishkoff (2010) The Evolution of Human Genetic and Phenotypic Variation in Africa, Current Biology 20, R166–R173
Hofer, T. et al. (2009) Large Allele Frequency Differences between Human Continental Groups are more Likely to have Occurred by Drift During range Expansions than by Selection, Annals of Human Genetics 73 (1): 95–108.
Lewontin, R. C. (1972) The apportionment of human diversity. Evolutionary Biology 6, 381–398.
Li, J. et al. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, Science 319: 1100-1104