Exploration 5: Superimposing genetic variation on the ancestry diagram
The 2-D depiction in the previous post greatly improved (when compared with the original Tishkoff tree of human ancestry) the degree to which the distance between groups was proportional to the time since the groups shared a common ancestor. (As already noted, we could adjust the depiction if we had a more refined analysis giving us data on different speeds of divergence from the common ancestor down different branches.) The 2-D depiction cannot, however, eliminate spurious appearances of similarity. Even if we put that objection aside for a moment, we need to note that the 2-D depiction still omits the genetic variation around the mid-point of any branch.
Two features of the original Tishkoff ancestry diagram gives us a whiff of variation around the mid-point of the branches: 1. the relative thickness of the branches—the thick trunk at the top indicates more genetic variation in the ancestral group than the think tips in the branches at the base; and 2. the density of the color of the branches—the deepest blue indicates more genetic variation than a lighter-shaded branch. (Tishkoff and collaborators suggest that the migration out of Africa brought with it only a small subset of the genetic variation in the African ancestral branch from which it broke off. The original population migrating out of Africa was, it seems, quite small.)
Although variation around the group’s midpoint is suggested by the preceding two features, the Tishkoff ancestry diagram does not in any way convey the fact that, on average, for any genetic locus roughly 85 % of the variation is within a population, 7 % is within a region, and only 6 % occurs among regions (using oft-cited figures from Lewontin 1973, subject to later refinement, but not, to my knowledge, qualitative revision). To convey this, we can add “aprons” around the mid-points of the 2-D depiction. In the following diagram aprons are added around groups A and H only.
The aprons are the same size because I can make the key point without exploring the available data to calibrate the apron size to match the different degrees of genetic variation within the groups ate the ends of the branches. That point is that ancestry trees show the genetic mid-points of branches and thus mostly hide the large amount of genetic variation not captured by the branching pattern. Such variation makes it difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch or the other. Difficult but not impossible; merely subject to more errors than to correct assignments. Random selection because clearly there must be some genetic differences that are specific to a branch in order for us to be able to trace ancestry patterns at all. If there are mutations that are very common in some people and rare in others, a tree can be made that captures the most likely branching pattern (i.e., one that assumes the least reversions, i.e, mutation in one direction, mutation back again to the original condition) even if most genes vary in ways that bear no sign of that branching.
Now, the 2-D fan diagram is far from perfect and I used a back of the envelope way of determining the size of the aprons, but the combination of the 2-D fan and aprons holds some promise for allowing simultaneously for similarity, diversity, and ancestry—the original question motivating this series of posts. The next posts explore 2-D depictions further.
Lewontin, R. D. (1973). “The apportionment of human diversity.” Evolutionary Biology 6: 381-397.