Molecular Evolution:Molecular evolution of reproductive genes in primates
From Carls wiki
Contents |
Introduction
We have tried various statistical methods of testing for the presence of selection in a gene. For this, we have chosen two genes, SEMG2 (Semenogelin II) and KLK3 (Kallikrein 3), both involved in male reproduction in primates.
The lab practical was also a demonstration of the effects of sperm competition. SEMG2 and KLK3 are locked in an evolutionary arms race in polyandrous populations: SEMG2 is involved in the coagulation of semen into a mating plug shortly after ejaculation, preventing fertilisation by the sperm of subsequent males, while also possibly preventing semen backflow, which increases the likelihood of fertilisation. KLK3, on the other hand, works to dissolve such plugs. Any small improvement in the expression of either of these genes can give the carrying male a significant advantage over his competitors — thus, such improvements will become fixed quickly in a population, and we should expect positive selection in these genes. The rate of the improvements are, according to research, positively correlated with the degree of promiscuity in females.
The level of polyandry in primates can be summarised as in this table:
| Species | Mating system | Mean number of male partners per periovulatory period |
|---|---|---|
| chimpanzee | multimale/multifemale | 8 |
| macaque | multimale/multifemale | 3 |
| human | various | 1 or 2 |
| red guenon | various | 1 or 2 (?) |
| orangutan | dispersed | 1 |
| gibbon | monogamous | 1 |
| gorilla | polygamous | 1 |
Methods
Pairwise DN/DS. Using DnaSP, we calculated the ratio of nonsynonymous over synonymous mutations. This can give an indication of the mutation history, in the following way:
| DN/DS | interpretation |
|---|---|
| < 1 | negative selection |
| = 1 | neutral selection |
| > 1 | positive selection |
McDonald-Kreitman test. We also used DnaSP to perform a McDonald-Kreitman test.
Tajima's test of neutrality. We completed the MK test with a Tajima's test of neutrality.
Phylogenetic trees. In MEGA, we constructed phylogenetic trees using mutation data from both genes in several humans, as well as a chimpanzee as the outgroup.
Maximum Likelihood. Finally, we performed Maximum-Likelihood calculations in PAML, changing the configuration file for every run. The four configurations we used were:
| configuration | model | NSites | description |
|---|---|---|---|
| c00 | one ratio | one ratio | Constrain the model to only accepting one DN/DS ratio for all sites in the gene as well as in all branches of the tree |
| c01 | one ratio | neutral | Constrain all branches to the same evolution, but allow sites to have either DN/DS = 0 or DN/DS = 1 |
| c02 | one ratio | selection | Constrain all branches to the same evolution, but allow sites to have one of three different rates: DN/DS = 0, DN/DS = 1 and DN/DS > 1 |
| c10 | branch-specific ratios | one ratio | Estimate specific DN/DS ratios for each branch in the tree, but constrain all sites in the gene to only one DN/DS ratio |
Results
Pairwise DN/DS
The following values resulted from running the analyses:
| gene | DN | DS | DN/DS |
|---|---|---|---|
| SEMG2 | 0.0244 | 0.0284 | .8591 |
| KLK3 | 0.0160 | 0.0505 | .3168 |
McDonald-Kreitman test
For the SEMG2 gene:
| substitutions | fixed | polymorphic |
|---|---|---|
| synonymous | 6 | 2 |
| nonsynonymous | 18 | 3 |
| nonsyn/syn | 3 | 1.5 |
with a P-value of 0.50715 (not significant) from Fisher's exact test.
For the KLK3 gene,
| substitutions | fixed | polymorphic |
|---|---|---|
| synonymous | 2 | 5 |
| nonsynonymous | 3 | 0 |
| nonsyn/syn | 1.5 | 0 |
with a P-value of 0.166667 (not significant).
Tajima's test of neutrality
Tajima's neutrality test gave this result:
| gene | Tajima's D | p-value |
|---|---|---|
| SEMG2 | -1,29379 | > 0.10 |
| KLK3 | -0,10602 | > 0.10 |
That is, neither of the Tajimas's D values were significant.
Phylogenetic trees
| Image:SEMG2 phylogeny.png |
Phylogenetic tree from SEMG2 data.
| Image:KLK3 phylogeny.png |
Phylogenetic tree from KLK3 data.
Maximum Likelihood
These were the log-likelihood values from the different configurations:
| gene | model00 | model01 | model02 |
|---|---|---|---|
| SEMG2 | -2926.66 | -2927.40 | -2919.38 |
| KLK3 | -937.09 | -927.09 | -921.83 |
Calculating the differences between adjacent models, and applying the following formula:
| Image:Likelihood formula.png |
we obtain the following LR values:
| gene | model00 → model01 | model01 → model02 |
|---|---|---|
| SEMG2 | 1.49 | -16.05 |
| KLK3 | -20.00 | -10.52 |
and, from the run of model10, these branch lengths:
| Image:ML tree.png |
Phylogenetic tree from model10 ML data. The branch labels are DN/DS ratios.
Discussion
Pairwise DN/DS. Both DN/DS values from the the pairwise ratio test are below 1, which is an indication of negative selection in general. However, the ratio value SEMG2 is 0.86, and since the uncertainty is 0.23 (the average DN/DS when comparing all human and chimpanzee genes), we cannot rule out neutrality/positive selection in the case of SEMG2.
McDonald-Kreitman test. The results of the McDonald-Kreitman tests were not significant, so we cannot draw any firm conclusions from them. Since the nonsyn/syn ratios are higher for fixed mutations than for polymorphic sites, we might suspect positive selection, however.
Tajima's test of neutrality. The values from Tajima's neutrality test were also not significant. But since the values are negative, if we were to interpret them, we would conclude that one of the many explanatory events for negative Tajima's D values has taken place: a selective sweep or a bottleneck, or the presence of slightly deleterious alleles in a gene.
Phylogenetic trees. The sequences in the KLK3 phylogeny shows wider dispersion and sub-branching than those in the SEMG2 phylogeny. This suggests that the SEMG2 gene has undergone a recent bottleneck/selective sweep.
Summarising so far, the results indicate positive selection in both genes, if sometimes not strongly enough to be significant.
Maximum Likelihood. The LR values in the ML tests show that each change from a simpler model to one with more parameters results in significantly higher ln L values, thus motivating the change. Thus, a model where the DN/DS ratios differ from site to site is best suited to explain the differences in the sequences. The branch lengths of the accompanying phylogenetic tree are generally higher for species with more promiscuous females. This suggests, as confirmed by research, that male-male competition is higher in polyandrous populations.
