Tag Archives: genetic variation

Depictions of human genetic relationships

Science writer, Nicholas Wade, and philosopher Nevan Sesardic, among others have argued that Rosenberg et al’s division of human genetic diversity into reasonably distinct clusters (depicted as bands of color in their diagrams) shows that human racial divisions have a biological basis after all.  Some lines of critical inquiry that I would recommend: Continue reading

Advertisements

Depictions of Human Genetic Relationships: Looking more at Li et al’s PCA plot

A recent post finished by wondering how we reconcile the figures for within-population variation versus among populations against the 2-dimensional PCA (principal component analysis) of Li et al. (2008), http://www.sciencemag.org/content/319/5866/1100.full  In Li et al’s analysis, 89% of variation is within populations, 2% is among populations, within groups, and 9% is among groups.  What such variation means is that it is difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch of the tree of human ancestry or the other.  However, the PCA plots (such as the one below) make it look like within groups variation is somewhat less than among groups variation.

Here are my current explorations:

1. Scale the diagram so that variation on the plot is proportional to variation accounted for by each of the first two principal components.  (I also rotated it because I like the convention of the major axis of variation being left to right.)

2.  Consider, as a thought experiment, groups separated on the first axis.  That is, 3 subdivisions of human genetic diversity — African (red); Europe + Middle East + most Central/South Asia + Oceania (all in one group, brown, green, light blue, and deep blue); and E Asia + America + some CS Asia (gold, purple).  Then choose some place on the second PC as if the variation in that direction were all the variation not accounted for by the grouping (rather than actually only 3/5 of it).  There’s a lot of overlap among the three groups for any position with PC2 > 0.2.   This shows how a randomly selected gene (or combination of genes) not captured by the first axis won’t be a reliable basis for separating groups; members of two or more groups will share that gene.

3.  Now consider groups separated by using both PC axes and imagine choosing a gene (or combination of genes) along the direction of the remaining variation (20% of the total).   Again, a randomly selected gene (or combination of genes) not captured by the first two axes won’t be a reliable basis for separating groups; members of two or more groups will share that gene.

4.  Granted, a randomly selected gene (or combination of genes) captured by the first two axes will do OK in separating groups.  However, if the trait we are concerned with involves many genes (not to mention environmental factors interacting over a developmental sequence), we will expect it to be difficult to link differences between any two individuals in different groups to genetic differences.

5.  Of course, IF the genes that do allow us to separate groups (either in an ancestry method or using PCA) had been the focus of natural selection in divergent environments, then the separation on the ancestry tree or PCA plot would mean something.  Is there any evidence for that?  Indeed, what would be required to establish evidence for that?

References

Li, J. et al. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, Science 319: 1100-1104

Depictions of Human Genetic Relationships: Looking at Li et al’s colored belt diagram

Following up on a long series of posts exploring depictions of human genetic relationships, let me consider the colored belt diagram and 2-dimensional PCA (principal component analysis) of Li et al. (2008), http://www.sciencemag.org/content/319/5866/1100.full

The 7 colors seem to suggest that there are 7 races that correspond reasonably to classical views of races.  However, the color coding comes from a genealogical tree on which the colors are superimposed and are given by reference to classical or geographic views of human groups (see below).   (That is, you get out what you put in.)

If, instead, we divided human genetic variation according to branching distance from the external reference population (namely, chimpanzees) and asked for 7 groups, they would be San, Mbuti Pygmies, Biaka Pygmies, Bantu, Mandenka, Yoruba, and the rest of the human population in the world all put together.  (Is this right?  The assumption here is that time since diverging from a common ancestor is, in molecular clock fashion, reflected in overall genetic divergence.)  That would give us a colored belt with 6 narrow bands of color on the left and one solid band stretching out the rest of the way to the right with a few stripelets from the six African groups appearing in that solid band.

Now someone could sample more in the African groups and produce a tree that, say, splits the San the Mbuti Pygmies, and the Biaka Pygmies and then, to make the 7th group, lumps the the rest of the human population in the world all put together.   Or someone could sample fewer African groups and end up with, say, three African groups, Mozabite, Bedouin, Palestinians, and the rest of the human population in the world all put together in the 7 human groups.

This sensitivity to sampling leads to the question: Is there a basis for subdivision that isn’t susceptible to who is sampled more or less and doesn’t depend on color coding that mostly defines the groups before doing the analysis?  Perhaps the PCA in figure S3B is what’s needed—it uses the full set of information about genetic variation in a sample of 900+ individuals to spread the sampled populations across two dimensions.

That diagram suggests (to my eye) 4 subdivisions of human genetic diversity — African (red), Europe + Middle East + Central/South Asia (all in one group, brown, green, and light blue), E Asia + America + some CS Asia (gold, purple), Oceania (deep blue).  Does that bring us back to Africans as one race, albeit with Europeans subsumed in a large mix?  That is, does it undermine a reading of the Campbell-Tishkoff diagram that suggests that, if there were 4 races, 3 would be African and one would be African plus the rest of the world put together?

At the same time, keep in mind is that, in Li et al’s analysis, 89% of variation is within populations, 2% is among populations, within groups, and 9% is among groups.  (This affirms results dating back to Lewontin 1972, but affirmed by subsequent work, such as Hofer et al. 2008, that, on average, for any genetic locus roughly 5/6 of the variation is within a population, 1/12 is within a region, and only 1/12 occurs among regions.)  What such variation means is that it is difficult, on the basis of a random selection of genetic loci, to assign an individual to one branch or the other.  We need to say difficult, not impossible; merely subject to more errors than to correct assignments.

The question I have (and plan to email the authors to see if they can help) is why this isn’t evident in the PCA plots.  Those plots make it look like within groups variation is somewhat less than among groups variation.  Perhaps this is because the PCA uses many loci and, contra what I just said above, a combination of a random selection of loci does allow us to discriminate among something like the classical groups?  (Remember, however, that people of recent mixed ancestry tend to be eliminated as subjects in these studies.)  Perhaps it is because the PCA is biased towards the particular loci that allow us to trace ancestry.   Obviously, I need to understand more about what the methods are doing.

References

Campbell, M.C. and S. A. Tishkoff (2010) The Evolution of Human Genetic and Phenotypic Variation in Africa, Current Biology 20, R166–R173

Hofer, T. et al. (2009) Large Allele Frequency Differences between Human Continental Groups are more Likely to have Occurred by Drift During range Expansions than by Selection, Annals of Human Genetics 73 (1): 95–108.

Lewontin, R. C. (1972) The apportionment of human diversity. Evolutionary Biology 6, 381–398.

Li, J. et al. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, Science 319: 1100-1104

Depictions of human genetic relationships: Exploration 1

Exploration 1: Rearranging the horizontal sequence of a tree diagram

(Continuing from the previous post, we consider alternative depictions of human genetic variation keeping in mind the question, “Can any depiction of genetic relationships among humans allow simultaneously for similarity, diversity, ancestry, and admixture?”)

The diagram of human ancestry from Tishkoff and collaborators branches out like an upside-down tree from a common ancestral group into 18 groups today.  (The diagram shows some cross-links that indicate gene-flow between populations, but we will ignore these for the time being.)

(Source: Michael C. Campbell1 and Sarah A. Tishkoff, 2010, “The Evolution of Human Genetic and Phenotypic Variation in Africa,” Current Biology 20, R166–R173.  Letters at the bottom added for the purposes of referring to the groups in this series of blog posts.)

Below we see the tree for the first three forks, where AR is short for a group that includes all the ancestors of groups A through R; NR for all the ancestors of groups N through R; etc.

Now, the branches at any fork can be flipped so the next diagram conveys the same information about ancestry and branching.

Notice that the second variant does not convey the impression that the branch that in ancestral to the non-Africans, i.e., NR, is more different from the branches ancestral to the African groups, i.e., AB, CC, DM, than these branches are from each other.  Although the lineage that ended up at CC (the ancestor of group C) branched off earlier than the lineage leading to NR, there is nothing in the ancestry diagram that says it should be more similar genetically to AB than to NR.

If we exclude diagrams with crossing over of branches, such as the one below, there are four distinct reorderings of the four branches that preserve the sequence of the branchings.  There are 2 to the power 16 = 65536 reorderings of the full set of the 18 current groups.  The point is not that we need to find one correct ordering from among such a large set.  The lesson is that no lessons should be drawn from the order along the bottom of a branching diagram that is not already contained in the sequence of branches above.  (In this light, diagrams with crossing over should be excluded because they suggest that the two branches at a fork are further away from each other than to one of the earlier branches, which goes against the information contained in the sequence of branches.)

It is not easy, however, to convince one’s brain not to give significance to these horizontal positions.  This cognitive weakness gives rise to the explorations in the next posts.

Race and Biology: A Case Study Of Curriculum Development II

GivItAGo: …here’s my thinking —
1. Race is such an important issue in shaping culture, psychology, economics in the United States. So we have to address it whenever it can help to do so.
2. Reciprocally, there are many cultural, psychological, economic, and other facets to how people’s understanding and actions with respect to race are shaped. And the facets differ from person to person. So let’s think of the task of addressing race as one of helping students assemble a tool box from which they can draw when faced with race.
3. In our biology curriculum, we can help students assemble tools that relate to the facets of race where biology is involved, or, at least, is invoked.

(Continuing from previous post…)
(G. Changes “lessons” to “tools” on flip chart.)
Studio: This sounds OK, but it seems a bit laissez faire. Wouldn’t it be better to work out a coherent analysis of race in our society — even better, a program for students to develop their ability to address racism in their work and lives?
G: You might be right, but can we switch roles for a moment. Do you really think you’re going to be able to do that in a biology course?
S: Good point.
G: Moreover, do you really think we’d be able to convey that full blown social analysis or curriulum on race to biology teachers?
S: You might be able to present that, but that wouldn’t mean these teachers would take it up and use it.
Jokero: (popping up) No need to worry about Presenting an analysis — that’s the One P Program — old hat! Remember the 2Ps: “Perform to Provoke!” (sits down)
G: Sceptico, you said something right on just now. The goal should be for students to take things away and use them. Thus the image of tools for a tool box.
S: Do you know how people take up tools and use them — what makes this happen?
G: No, I don’t know. But let’s work out some tools first and then it’ll be easier to think about that.
S: OK. Why don’t you start with factual tools, given that I’m sceptical this category can be usefully separated from that of conceptual tools.
G: OK. Imagine I’m a teacher (puts on Pedago name tag). And I have students who know little about biology… (Studio comes on stage and addresses Pedago.)
Studio: People differ in skin color and that has a biological basis, why not other characteristics, such as intelligence?
G: Biologists have studied many enzymes that come in different forms, that is, you might have a different from from me and the difference would be coded in our genes. And biologist Richard Lewontin says they found that 85% of the variation among humans occurs among people within their own group, such as races, leaving only a fraction of variation among races.
S: (interrupts) Is that the exact fact he cites?
G: No — I plan to track down the info and to get good illustrations.
Studio: What does it mean 85% of the variation is within groups?
G: (Draws on fresh page of flip chart a scatter of dots and Xs) Imagine these points are the individuals in the human species and their position represents their differences. (Makes two greatly overlapping circles around them, and marks the center of each circle). The circles are two groups and the centers are the average of each group. The difference between the centers, the averages, is swamped by the scatter around the centers for each group.
S: (interrupts) This sounds very conceptual. What’s more you’ll need to go more slowly to get my students comfortable with these ideas.
G: OK. Another way of thinking about this is to say, if I gave you a point and you didn’t know what shape it was, would you be able to assign it correctly to the dots or the Xs? With much more variation within the group compared to between it, you’d be wrong a lot of the time.
Studio: But what if we looked at lots of enzymes at the same time — wouldn’t we be able to improve our rate of correct assignments.
G: That’s a very good question. Lewontin doesn’t talk about this. Let’s add that to the list of factual tools students might ask for. I suspect the answer is “no,” because of the amount of interbreeding there has been between people whose ancestors came from different continents.
Studio: How much interbreeding has there been? What’s the average fraction of European genetic ancestry in African-Americans?
G: You should also ask: What’s the average African genetic ancestry in people who don’t identify themselves as African-Americans or as hyphen-Americans at all?
Studio: OK. What’s the average African genetic ancestry in people who don’t identify themselves as African-Americans or as hyphen-Americans at all?
G: And what’s the range of these fractions among different people? — I’ll have to get answers to these questions and add them to our factual tool kit.
Studio: But once you do I have a new question for you — If the overall picture is of genetic overlap among races, does that rule out there being specific genes that differ more distinctly? If there weren’t, how could we tell races apart at all.
G: No, it doesn’t rule it out. Sickle cell genotypes are much more common in African-Americans. How common? — Let’s add that to our list, and also try to find out what other genes that’s the case for. (does so) But there’s no reason to link these kinds of genes with something socially significant such as intelligence or behavior more generally.
Studio: Why not? Dog breeds differ in appearance and also in behavior.
G: The example of dogs often gets brought up by students, so here’s another place where more facts would be valuable. How does genetic variation within dog breeds compare with variation among the averages for the breeds? (adds to flip chart)
S: Why don’t you proceed as if you had that information and the answer is more or less the same as Lewontin gave for humans.
G: OK. Try this, Studio: “Biologists have found that 75% of the variation among dogs occurs within breeds, leaving only a fraction of variation among breeds.”
Studio: That only shows that the genetic facts you’re giving us aren’t an adequate way of looking at the biology of dog breeds — we all know how breeds differ.
G: And if the genetic facts don’t seal the argument for dog breeds, why should they for human races. Hmmm. Sceptico, can I try another version of the facts?
S: Go ahead.
G: Try this instead, Studio: “Unlike humans, dog breeds can be distinguished genetically. But they can all interbreed, and will if allowed.”
Studio: Indeed. I think this is what has happened with humans. So races overlap more than they did in the past — racial categories are not as meaningful as they used to be. But, couldn’t there remain an average difference among races that corresponds to genetic differences?
G: You mean so that difference in test scores among races wasn’t simply a result of how races are brought up, educated, and treated in this society?
Studio: Yes.
G: Here’s where we need some conceptual lessons about what it means to partition variation into different sources — genetic and non-genetic. And about what among group differences do and do not mean for any individual. After all, on average men are taller than women, but there are some women taller than a majority of men. Focusing on the average difference contributes, unfortunately, to the cultural norm about men being taller than their female partners. This norm reduces the range of potential partners for tall women.
Studio: Tall heterosexual women.
G: OK.
G: And if the issue is not height, but average test score differences, it’s even more important not to use average differences to stereotype the range of individuals.
S: But that is done a lot in our educational system. Undertanding why this happens is another reason why I’m sceptical of teaching race in a biology class.
G: But this is where historical case studies come in. Let’s look at the recurring attempts to make race a biological issue and see how biology was debunked that was earlier accepted as established knowledge. Then ask students to consider that the same could be true for current science.
S: That seems too much for my non-bio-major students.
G: Au contraire — I suggest it’ll make it easier for them to get engaged.
S: I doubt that you have convincing evidence to back that assertion up.
Studio: Enough from you two. It’s time to hear from the “students,” that is the teachers in your audience. Didn’t you say, you were going to go through the dialog a second time to allow the audience to call time and question what we say?
G: So I did. Let’s give that a go.
S: Do you really think that’ll work with these students?.
J: Didn’t someone famous once say: “The first time a tragedy, the second time a farce”?
Prepared June 99 at a BioQUEST workshop, as a result of interaction with Steve Fifield, Raquell Holmes, and Joel Hagen.

Race and Biology: A Case Study Of Curriculum Development

GivItAGo: Today’s topic is the teaching of race to biology students…

    (GivItAGo [=”we should try it and see how it goes” in Australian English] stands center stage behind a lectern; Sceptico sits in a chair to the side watching; Jokero sits at the back of the audience. All have their names on big name tags.)

Sceptico (stands up): Time Out. What are you doing — standing behind a podium and lecturing? This is a BioQuest workshop — the audience should be involved in Posing the problem, Problem-solving, and Persuading others of the value of their approach.
(GivItAGo listens thoughtfully and during the following exchange continues to ponder the problem.)
Jokero (pops up and calls out from the back): Oh, that’s what the 3Ps are — I thought it was Perform to Provoke.
Sceptico: That’s only two Ps.
Jokero: Are you sure?
Sceptico: Yes, I’m sure.
Jokero: Oh well, I never had classes on numbers. That’s content. My teachers were only interested in process. (sits down)
GivItAGo (moving in front of the lectern): OK, let’s give this a try. We have prepared a script and we want to perform it. But, after we’ve run through it once, let’s start again and allow the audience to question what we say and suggest alternatives.
(Sceptico indicates sceptical assent [think of Clinton’s pursed lips, but without the smirk] and sits down.)
G: Today we’re going to explore the teaching of race to biology students.
S: Why teach about race? — don’t you aspire to a race blind society?
G: I do want a prejudice-free society, but we’re not nearly there yet. Race is still very important in US society, whether or not people think it should be. Turning a blind eye to it is not going to make discrimination go away.
S: OK, but still I don’t think you should teach race in a biology class. It sends the message that race is based in the facts of biology. Look at the history of biology being used to justify exploitation of one social group by another, and often to justify the extermination of the subordinant group.
G: Good point, Sceptico. Let’s put “facts about biology” and “race and historical case studies” on the list of things we should teach about biology and race.
S: History in a biology class!?
J: (pops up) Didn’t someone famous once say: “Nothing in biology makes sense except in the light of history”? (sits down)
G: I’m not sure how it will work, but let’s give it a go. (Walks over to flip chart and writes down “factual lessons” and “historical case studies.”)
S: Are you proposing that there are facts that stand for themselves?
G: Good point. Let’s add “conceptual lessons.” (does so)
S: I don’t think the distinction factual vs. conceptual is very helpful for thinking about how to teach this material.
G: You might be right, but here’s my thinking —
1. Race is such an important issue in shaping culture, psychology, economics in the United States. So we have to address it whenever it can help to do so.
2. Reciprocally, there are many cultural, psychological, economic, and other facets to how people’s understanding and actions with respect to race are shaped. And the facets differ from person to person. So let’s think of the task of addressing race as one of helping students assemble a tool box from which they can draw when faced with race.
3. In our biology curriculum, we can help students assemble tools that relate to the facets of race where biology is involved, or, at least, is invoked. (Changes “lessons” to “tools” on flip chart.)
(to be continued)

Prepared June 99 at a BioQUEST workshop, as a result of interaction with Steve Fifield, Raquell Holmes, and Joel Hagen.