Depictions of Human Genetic Relationships: Looking at Li et al’s colored belt diagram

Following up on a long series of posts exploring depictions of human genetic relationships, let me consider the colored belt diagram and 2-dimensional PCA (principal component analysis) of Li et al. (2008), http://www.sciencemag.org/content/319/5866/1100.full

The 7 colors seem to suggest that there are 7 races that correspond reasonably to classical views of races.  However, the color coding comes from a genealogical tree on which the colors are superimposed and are given by reference to classical or geographic views of human groups (see below).   (That is, you get out what you put in.)

If, instead, we divided human genetic variation according to branching distance from the external reference population (namely, chimpanzees) and asked for 7 groups, they would be San, Mbuti Pygmies, Biaka Pygmies, Bantu, Mandenka, Yoruba, and the rest of the human population in the world all put together.  (Is this right?  The assumption here is that time since diverging from a common ancestor is, in molecular clock fashion, reflected in overall genetic divergence.)  That would give us a colored belt with 6 narrow bands of color on the left and one solid band stretching out the rest of the way to the right with a few stripelets from the six African groups appearing in that solid band.

Now someone could sample more in the African groups and produce a tree that, say, splits the San the Mbuti Pygmies, and the Biaka Pygmies and then, to make the 7th group, lumps the the rest of the human population in the world all put together.   Or someone could sample fewer African groups and end up with, say, three African groups, Mozabite, Bedouin, Palestinians, and the rest of the human population in the world all put together in the 7 human groups.

This sensitivity to sampling leads to the question: Is there a basis for subdivision that isn’t susceptible to who is sampled more or less and doesn’t depend on color coding that mostly defines the groups before doing the analysis?  Perhaps the PCA in figure S3B is what’s needed—it uses the full set of information about genetic variation in a sample of 900+ individuals to spread the sampled populations across two dimensions.

That diagram suggests (to my eye) 4 subdivisions of human genetic diversity — African (red), Europe + Middle East + Central/South Asia (all in one group, brown, green, and light blue), E Asia + America + some CS Asia (gold, purple), Oceania (deep blue).  Does that bring us back to Africans as one race, albeit with Europeans subsumed in a large mix?  That is, does it undermine a reading of the Campbell-Tishkoff diagram that suggests that, if there were 4 races, 3 would be African and one would be African plus the rest of the world put together?

At the same time, keep in mind is that, in Li et al’s analysis, 89% of variation is within populations, 2% is among populations, within groups, and 9% is among groups.  (This affirms results dating back to Lewontin 1972, but affirmed by subsequent work, such as Hofer et al. 2008, that, on average, for any genetic locus roughly 5/6 of the variation is within a population, 1/12 is within a region, and only 1/12 occurs among regions.)  What such variation means is that it is difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch or the other.  We need to say difficult, not impossible; merely subject to more errors than to correct assignments.

The question I have (and plan to email the authors to see if they can help) is why this isn’t evident in the PCA plots.  Those plots make it look like within groups variation is somewhat less than among groups variation.  Perhaps this is because the PCA uses many loci and, contra what I just said above, a combination of a random selection of loci does allow us to discriminate among something like the classical groups?  (Remember, however, that people of recent mixed ancestry tend to be eliminated as subjects in these studies.)  Perhaps it is because the PCA is biased towards the particular loci that allow us to trace ancestry.   Obviously, I need to understand more about what the methods are doing.

References

Campbell, M.C. and S. A. Tishkoff (2010) The Evolution of Human Genetic and Phenotypic Variation in Africa, Current Biology 20, R166–R173

Hofer, T. et al. (2009) Large Allele Frequency Differences between Human Continental Groups are more Likely to have Occurred by Drift During range Expansions than by Selection, Annals of Human Genetics 73 (1): 95–108.

Lewontin, R. C. (1972) The apportionment of human diversity. Evolutionary Biology 6, 381–398.

Li, J. et al. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, Science 319: 1100-1104

About these ads

About Peter J. Taylor
Peter Taylor is a Professor at the University of Massachusetts Boston where he teaches and directs undergraduate and graduate programs on critical thinking, reflective practice, and science-in-society. His research and writing focuses on the complexity of environmental and health sciences in their social context, incl. Unruly Complexity: Ecology, Interpretation, Engagement (U. Chicago Press, 2005). Taking Yourself Seriously: Processes of Research & Engagement (with J. Szteiter) appeared early in 2012 (http://bit.ly/TYS2012).

2 Responses to Depictions of Human Genetic Relationships: Looking at Li et al’s colored belt diagram

  1. Chuck says:

    ” Perhaps this is because the PCA uses many loci and, contra what I just said above, a combination of a random selection of loci does allow us to discriminate among something like the classical groups?”

    Wasn’t this the point made by Witherspoon (2007) and Edwards (2003) ? Hence: “Lewontin’s Fallacy.”

    ( Edwards, 2003. Human genetic diversity: Lewontin’s fallacy; Witherspoon et al , 2007. Genetic similarities within and between human population)
    ……
    Speaking of which — I’ve been trying to get a definitive answer to the question below. Maybe you know:

    In “Human genome diversity: frequently asked questions,” Barbujani and Colonn (2010) state:
    “The remaining 85% represents the average difference between members of the same population. One way to envisage these figures is to say that the expected genetic difference between unrelated individuals from distant continents exceeds by 15% the expected difference between members of the same community.”

    According to them, the between individual, between population variance is only 15% (and is identical to the between population variance). This contradicts Sarich and Miele’s (2004) point (in The Reality of Human Difference):
    “First is the 15 percent that is interpopulational. The other 85 percent will then split half and half (42.5 percent) between the intra- and interindividual within-population comparisons. The increase in variability in between-population comparisons is thus 15 percent against the 42.5 percent that is between-individual within-population. Thus, 15/42.5 = 32.5 percent a much more impressive and, more important, more legitimate value than 15 percent.”

    According to Sarich and Miele, the 15% — in the context of between individual differences — fails to take into account the within individual differences. (i.e Between individual, between population variance is not identical to between population variance, since a portion of the within population variance is within individuals).

    While the whole issue is somewhat scholastic given other findings [1] — Fst implies little about heritable phenotypic differences — either Barbujani and Colonn’s statement or Sarich and Miele’s is correct. Which is it?

    I fail to see how the former could be — if the within individual variance is >0%, which it is, the between individual variance within populations must be 15% — but as I don’t have a good grasp of how Fst is measured, I’m uncertain.

    [1] Long and Kittles, 2003. Human Genetic Diversity and the Nonexistence of Biological Races

    • Chuck says:

      The last three sentences should read:

      Either Barbujani and Colonn’s statement or Sarich and Miele’s is correct. Which is it? I fail to see how the former could be — if the within individual variance is >0%,which it is, the between individual variance within populations must be <85%. Using Barbujani and Colonn's formulation, we would have "unrelated individuals from distant continents exceeds by 15% (of the total diversity) the expected difference of ~45% (of the total diversity) between members of the same community" — but as I don’t have a good grasp of how Fst is measured, I’m uncertain.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 122 other followers

%d bloggers like this: