Depictions of Human Genetic Relationships: Looking more at Li et al’s PCA plot

A recent post finished by wondering how we reconcile the figures for within-population variation versus among populations against the 2-dimensional PCA (principal component analysis) of Li et al. (2008), http://www.sciencemag.org/content/319/5866/1100.full  In Li et al’s analysis, 89% of variation is within populations, 2% is among populations, within groups, and 9% is among groups.  What such variation means is that it is difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch of the tree of human ancestry or the other.  However, the PCA plots (such as the one below) make it look like within groups variation is somewhat less than among groups variation.

Here are my current explorations:

1. Scale the diagram so that variation on the plot is proportional to variation accounted for by each of the first two principal components.  (I also rotated it because I like the convention of the major axis of variation being left to right.)

2.  Consider, as a thought experiment, groups separated on the first axis.  That is, 3 subdivisions of human genetic diversity — African (red); Europe + Middle East + most Central/South Asia + Oceania (all in one group, brown, green, light blue, and deep blue); and E Asia + America + some CS Asia (gold, purple).  Then choose some place on the second PC as if the variation in that direction were all the variation not accounted for by the grouping (rather than actually only 3/5 of it).  There’s a lot of overlap among the three groups for any position with PC2 > 0.2.   This shows how a randomly selected gene (or combination of genes) not captured by the first axis won’t be a reliable basis for separating groups; members of two or more groups will share that gene.

3.  Now consider groups separated by using both PC axes and imagine choosing a gene (or combination of genes) along the direction of the remaining variation (20% of the total).   Again, a randomly selected gene (or combination of genes) not captured by the first two axes won’t be a reliable basis for separating groups; members of two or more groups will share that gene.

4.  Granted, a randomly selected gene (or combination of genes) captured by the first two axes will do OK in separating groups.  However, if the trait we are concerned with involves many genes (not to mention environmental factors interacting over a developmental sequence), we will expect it to be difficult to link differences between any two individuals in different groups to genetic differences.

5.  Of course, IF the genes that do allow us to separate groups (either in an ancestry method or using PCA) had been the focus of natural selection in divergent environments, then the separation on the ancestry tree or PCA plot would mean something.  Is there any evidence for that?  Indeed, what would be required to establish evidence for that?

References

Li, J. et al. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, Science 319: 1100-1104

Advertisements

2 thoughts on “Depictions of Human Genetic Relationships: Looking more at Li et al’s PCA plot

  1. Chuck

    I don’t quite understand the question. There clearly was been recent positive selection and differentiation between populations [1,2]. You seem to be asking: if you just look at regions under positive selection do you still get the same clusters? I don’t see why it matters. For the medical meaningfulness of continental ancestries? When it comes to interesting between population heritable differences, the coherence of continental clusters doesn’t really matter.

    [1] Herráez, et al., 2009. Genetic variation and recent positive selection in worldwide human populations: evidence from nearly 1 million SNPs.
    [2] Wu and Zhang, 2011. Different level of population differentiation among human genes

    Reply
  2. Chuck

    “Now consider groups separated by using both PC axes and imagine choosing a gene (or combination of genes) along the direction of the remaining variation (20% of the total). Again, a randomly selected gene (or combination of genes) not captured by the first two axes won’t be a reliable basis for separating groups; members of two or more groups will share that gene.”

    If you toss in a third PC axis, you get an image like figure 2 in Tishkoff et al., 2009. The genetic structure and history of Africans and African Americans. The space of genetic variations has more than 3 axes; where you to look at the clusters in 3+n-dimensional space, the clusters would become more discernible. Steve Hsu over at “information processing” has a number of posts on this. See for example: Hsu, 2007. “Metric on the space of genomes and the scientific basis for race.” Information processing. Jan 4.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s