Tag Archives: diagram

Science and maps: Discussion with Katy Börner (Day 5 of Learning road trip, morning)

The “Places & Spaces: Mapping Science” collection at the University of Indiana http://scimaps.org/maps/browse/ includes many aesthetically pleasing, information-filled figures, maps, and other graphics.   I arranged to talk with the director of Places & Spaces, Katy Börner, so as to explore the tension between map as representations of reality (which Places & Spaces exemplified) and as devices that show the way-providing a guide for further inquiry or action.  I let her know that I was interested in how new angles of representation (as were evident in Places & Spaces) can inform (or not) the ephemeral, pragmatic maps my students make during their research and writing.

What happened:

Katy Börner described how her background as an engineer led her to be interested in tool.  Tools her group has developed for analysis of large data sets allow us to explore when, where, what, and with whom (e.g., in what networks).  These tools can be put in the hands of policy-makers, funders, research agencies (e.g., so NSF can identify who good reviewers are).  One resource is a data base of scholarship (most biomedical) that has 25m records (http://sdb.cns.iu.edu).  Such a database affords lots of room for exploration, which is a key aspect of her course on informational visualization.  Another emphasis is on focus (“high-level resolution”) and context (which is characteristic of Places & Spaces).  By implication, Börner’s response to the tension I identified is to say there is lots of room for exploration once one has access to a super-large data base and some tools (presented in tutorials that accompany http://sdb.cns.iu.edu.

(Start of road trip; Day 5 midday)

Depictions of human genetic relationships: Exploration 8

Exploration 8: Deeper messages in the conventional ancestral tree of human groups

The original Tishkoff diagram of human ancestry is certainly easier to read than the reticulating web of exploration 6, let alone the web overlaid with aprons in exploration 7.  We could try to remedy this by helping audiences to become familiar with the graphic conventions and by using technology like the slide show to display the branching and replacement of ancestral aprons with those of their descendants.  In this concluding post of the series I argue that it is important for all to work on being able to read reticulating webs because of an undesirable message built into the simpler branching diagram.

To expose this message, consider the horizontal links in the Tishkoff diagram, which represent gene flow between branches, that is, admixture.  The branching pattern can be extracted from the genetic data only because these flows are not so large as to obscure the genetic mutations or other differences that arose over time after each branching.  Indeed, to ensure that this is the case, some studies of human genetic variation involve data from the special subset of people who live in the same place as, say, all their great-grandparents.  The reticulating web with aprons likewise relies on a branching pattern that can be discerned despite the potentially confounding effects of gene flow.  Still, the aprons remind us of variation around the mid-point of each group—variation that may well have been enlarged by gene flow.

Now, there are some branching patterns that are subject to minimal or no gene flow, namely branching of species or higher taxa (taxonomic groups) from ancestral taxa.   We are all familiar with such evolutionary trees.  The first example is for the classes of vertebrates; the second is for liverwort species.


Source: http://www.biology.duke.edu/bryology/LiToL/LeafyII.html

Our familiarity with these trees invites us to think—even if subconsciously—about human genetic ancestry as if the branches are like separate species.  There is a long history of scientific arguments that human races are separate species, or that the branches of the human tree achieved human status at different rates.  As Desmond and Moore (2009) have shown in Darwin’s sacred cause: How a hatred of slavery shaped Darwin’s views on human evolution, the debate was especially heated during Darwin’s adult life.  Darwin’s view of descent from a single common ancestor was a minority view, discredited to some extent by its association with literal interpretation of the bible’s account of Adam and Eve, but more so by its association with anti-slavery movements.  Yet, the debate did not disappear with the 19th century.  Carleton Coon, a physical anthropologist who died in 1981 after a long career as a professor at Harvard and University of Pennsylvania, wrote in 1962 that Homo erectus evolved into Homo sapiens five separate times “as each subspecies, living in its own territory, passed a critical threshold from a more brutal to a more sapient state”.  (http://en.wikipedia.org/wiki/Carleton_S._Coon#Polygenism)  The multiregional hypothesis is a more recent variant.

Ideas about multiple origins for humans are not the only way that biology can be invoked to explain or even justify a hierarchy of human races.  However, to the extent that we want to distance ourselves from such views, it can only help to do the work to depict genetic relationships among humans in ways that allow simultaneously for similarity, diversity, and admixture at the same time as we depict ancestry.

Depictions of human genetic relationships: Exploration 7

Exploration 7: Superimposing genetic variation on the ancestry diagram from a simulation

The following picture comes from the same random simulation used in the previous post to generate directions of branching and the distances of each branch from its most recent common ancestor.  The two dimensions stand for the genetic variation of the whole set of populations.  This time aprons are drawn around the midpoints of the groups A to R at the bottom of the ancestry tree (but not around their ancestors).  This shows that the variation of the original population (which would extend about 20% past the largest circle) is reduced after the branchings that have brought us to the present, but there is still great overlap between most groups.  In particular, the descendants A and B of the group AB, which branched off early, shows variation that subsumes that in the the rest of the groups.

A careful viewer might notice, however, that there are some circles that do not overlap at all, as if to say these groups share no genetic variation.  This is an artifact of my deciding to reduce the variation at each branching enough so that not all the circles would extend beyond the web.  In doing so, I realized that I was increasing the ratio of between groups to average within-group variation well beyond what we find in the actual human world.

Depictions of human genetic relationships: Exploration 5

Exploration 5: Superimposing genetic variation on the ancestry diagram

The 2-D depiction in the previous post greatly improved (when compared with the original Tishkoff tree of human ancestry) the degree to which the distance between groups was proportional to the time since the groups shared a common ancestor.  (As already noted, we could adjust the depiction if we had a more refined analysis giving us data on different speeds of divergence from the common ancestor down different branches.)  The 2-D depiction cannot, however, eliminate spurious appearances of similarity.  Even if we put that objection aside for a moment, we need to note that the 2-D depiction still omits the genetic variation around the mid-point of any branch.

Two features of the original Tishkoff ancestry diagram gives us a whiff of variation around the mid-point of the branches: 1. the relative thickness of the branches—the thick trunk at the top indicates more genetic variation in the ancestral group than the think tips in the branches at the base; and 2. the density of the color of the branches—the deepest blue indicates more genetic variation than a lighter-shaded branch.  (Tishkoff and collaborators suggest that the migration out of Africa brought with it only a small subset of the genetic variation in the African ancestral branch from which it broke off.  The original population migrating out of Africa was, it seems, quite small.)

Although variation around the group’s midpoint is suggested by the preceding two features, the Tishkoff ancestry diagram does not in any way convey the fact that, on average, for any genetic locus roughly 85 % of the variation is within a population, 7 % is within a region, and only 6 % occurs among regions (using oft-cited figures from Lewontin 1973, subject to later refinement, but not, to my knowledge, qualitative revision).  To convey this, we can add “aprons” around the mid-points of the 2-D depiction.  In the following diagram aprons are added around groups A and H only.

The aprons are the same size because I can make the key point without exploring the available data to calibrate the apron size to match the different degrees of genetic variation within the groups ate the ends of the branches.  That point is that ancestry trees show the genetic mid-points of branches and thus mostly hide the large amount of genetic variation not captured by the branching pattern. Such variation makes it difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch or the other. Difficult but not impossible; merely subject to more errors than to correct assignments. Random selection because clearly there must be some genetic differences that are specific to a branch in order for us to be able to trace ancestry patterns at all. If there are mutations that are very common in some people and rare in others, a tree can be made that captures the most likely branching pattern (i.e., one that assumes the least reversions, i.e, mutation in one direction, mutation back again to the original condition) even if most genes vary in ways that bear no sign of that branching.

Now, the 2-D fan diagram is far from perfect and I used a back of the envelope way of determining the size of the aprons, but the combination of the 2-D fan and aprons holds some promise for allowing simultaneously for similarity, diversity, and ancestry—the original question motivating this series of posts.  The next posts explore 2-D depictions further.


Lewontin, R. D. (1973). “The apportionment of human diversity.” Evolutionary Biology 6: 381-397.

Depictions of human genetic relationships: Exploration 4

Exploration 4: Arranging the groups on the ancestry tree in two dimensions so that distance reflects the time since branching (somewhat)

The previous post depicted the diagram of human ancestry from Tishkoff and collaborators in a way that made the distance between any pair of branches proportional to the time since they split in two from their most recent common ancestor.   This does not mean, however, that the distance between every pair of the 18 groups at the base of the diagram is equal to the time since their common ancestor.  By spreading out the tree into a fan, we can do much better on this last count.  (Recall: NR stands for the ancestor of groups N through R, where the groups are as labeled across the bottom of the Tishkoff diagram of human ancestry.

This 2-D depiction eliminates the crossing over that made the tree in the previous post difficult to read.  However, it is far from perfect.  For example, C and N end up close even though their common ancestor was almost as distant in time as could be.  A sophisticated algorithm might arrive at a better fan than the first attempt above, but we could never get around the fact that at each branching point, the branches could be flipped (e.g., E or F could be made close to D instead of G).  These limitations notwithstanding, the next post employs the 2-D fan to depict something about the genetics omitted so far in these posts.

Depictions of human genetic relationships: Exploration 3

Exploration 3: Arranging the groups on the ancestry tree so that distance reflects (to some extent) the time since branching

For the previous post I was able to use a (virtual) mobile to depict the diagram of human ancestry from Tishkoff and collaborators down to the level of 4 branches, i.e., AB, CC, DM, and NR.  Moreover, I was able to make the distance between any pair of branches, i.e., (DM,NR), (CC,DR), (AB,CR) proportional to the time since they diverged.  In the figure below this distance relationship holds for all pairs of branches in the full tree.   (Of course, a more refined analysis might allow for different speeds of divergence from the common ancestor down different branches, but this could also be depicted in this same form.)

As in the previous post, the distance relationship between members of a pair does not mean that the distance between every pair of the 18 groups at the base of the mobile—there are 153 such pairings—is equal to the time since their common ancestor.  Indeed, the crossing over makes this obvious.  It’s easy to see that, for example, the closeness of H and L is not because they share a recent common ancestor.  The use of the mobile in the previous post suggests the next step in exploration, namely, one in which crossing over is eliminated by spreading out the groups in two dimensions.

Depictions of human genetic relationships: Exploration 2

Exploration 2: Arranging the groups on the ancestry tree so that distance reflects (to some extent) the time since branching

The order of the 18 current human groups at the base of the diagram of human ancestry from Tishkoff and collaborators is only 1 of 65536 possible orderings that preserve the same sequence of branchings from a common ancestral group 150,000 years ago.  The image that comes to mind is of a mobile with each pair of branches able to revolve around the position of its most recent common ancestor, which will itself be moving as it revolves with another branch around its common ancestor.  I found a wonderful website for building mobiles, http://www.nga.gov/kids/zone/mobile.htm, and was able to use it to replicate the Tishkoff diagram down to the level of 4 branches, i.e., AB, CC, DM, and NR.

In this mobile, I was able to make the distance between any pair of branches, i.e., (DM,NR), (CC,DR), (AB,CR) proportional to the time since they diverged.  This differs from the original Tishkoff diagram, which has, for example, A, B, and C close together at the bottom even though the common ancestor of A and B, i.e., AB, branched off 150,000 years ago from the ancestor of C, i.e., CC.  This is plenty of time for genetic divergence to occur.

The distance relationship between members of a pair does not mean that the distance between every pair of the four groups at the base of the mobile is equal to the time since their common ancestor.  Mostly, the mobile serves to remind us “that no lessons should be drawn from the order along the bottom of a branching diagram that is not already contained in the sequence of branches above” (see previous post).

The mobile software allows one to view the mobile from above as well as from the side.  In the following snapshots from above, the red ball is AB, the lime one is CC, and the blue and purple are DM and NR.  (The relative size of the balls has no significance.)

I haven’t constructed a mobile for the whole ancestry tree, but the next post extends the feature of the mobile that had the distance between any pair of branches proportional to the time since they diverged.