Molecular Evolution:Estimating linkage disequilibrium with population SNP data

From Carls wiki

Jump to: navigation, search

Contents

Introduction

We examined several sites where SNPs are known to occur in the DNA from two species: dog (Canis lupus familiaris) and flycatcher (Ficedula). From this we were able to quantify the linkage disequilibrium between those sites in the respective species.

Because of the way recombination works on chromosomes, adjacent loci are more often inherited together than more distant ones. When there is a period of strong selection in the population, or if the population decreases drastically in size (a so-called "population bottleneck"), the remaining individuals will show reduced variation. Favoured genes will tend to remain, along with adjacent SNPs.

Linkage disequilibrium (LD) is a deviation from the frequencies of allele combinations that would be expected from chance. Specifically, if locus A exists in the variants a and A, and locus B has the variants b and B, an individual may carry either of the combinations ab, Ab, aB or AB. The expected frequency for each of the combinations is the product of the respective frequency of the variants. (For example, faB = fa fB.)

The results agree with our expectations: LD in flycatcher is low, while LD in dog is high. This is because dogs have passed through two recent population bottlenecks. Flycatchers have undergone no such event recently.

Methods

Lab part

PCR amplification. Each group was instructed to amplify one marker either in all flycatchers or in all dogs. Our group chose a marker in flycatcher, FLYSNP2. We prepared a master mix, with the following ingredients:

  Stock conc. Flycatcher 55 µl/rx 30 rx
Buffer Gold 10 x 1 x 5.5 µl 165 µl
MgCl2 25 mM 2.5 mM 5.5 µl 165 µl
Forw 10 µM 0.2 µM 1.1 µl 33 µl
Rev 10 µM 0.2 µM 1.1 µl 33 µl
dNTP 20 mM 0.2 mM 0.55 µl 16.5 µl
AmpliTaqGold 5 U/µl 0.025 U/µl 0.275 µl 8.25 µl
ddH2O Up to 55 µl 35.975 µl 1050 µl

Electrophoresis. We made a gel cast, and loaded samples from each PCR purification (mixed with loading buffer) in the wells. Then we let the electrophoresis run for a little over 30 minutes at 175 V. We stained the gel in an ethidium bromide bath and photographed it with a UV-camera. The photo can be found in the Results section.

Pyrosequencing. Following the "PSQ™ 96 Sample Preparation Guidelines for SNP Analysis", we prepared samples and loaded them into the pyrosequencer. While the machine worked away on our samples, we observed the progress on a nearby computer.

Computer part

Estimation of LD. Using Haploview, and combining our sequence results with those of the other groups, we obtained quantitative LD values, as well as visualisations of these values. We made use of the LD plot and Four gamete Rule functions.

Results

Gel electrophoresis

Image:Gel electrophoresis fly2.png The UV photo of the gel electrophoresis can be seen to the right. A few wells can be seen to have faint or no traces of DNA; notably, three in the lower middle and one in the upper left.


Pyrosequencing

The results of the pyrosequencing can be seen below. Orange signifies that a manual inspection is required, while red signifies that the sample couldn't be sequenced. Note that the failures in m2 and m23 correspond to gaps in the UV photo.

  1 2 3
A m1
A/A
m12
G/A
m22
G/G
B m2
G/A
m13
G/A
m23
G/A
C m4
G/G
m14
A/A
m24
G/G
D m5
G/A
m16
A/A
m25
G/A
E m7
A/A
m17
G/A
m26
A/A
F m8
A/A
m18
G/A
m28
A/A
G m10
G/A
m20
G/A
m29
A/A
H m11
A/A
m21
G/A
m30
G/G

Estimation of LD

Haploview quantifies linkage disequilibrium with two different values, D' and r2. These are defined as

Image:D-prime-formula.png Image:R-squared-formula.png

where DAB = fAB - fAfB, and Dmax = min(fafB, fAfb) if D ≥ 0, or min(fAfB, fafb) if D < 0. fAB and fab are genotype frequencies, and fA, fB, fa, fb are allele frequencies.

These are the LD values calculated for flycatcher.

L1 L2 D' r2
marker01 marker02 0.012 0.000
marker01 marker03 0.135 0.018
marker02 marker03 0.072 0.002

D' and r2 values for flycatcher.

In an LD plot, the above table looks like this:

Image:LD plot flycatcher.png

LD plot of flycatcher.

And here are the LD values for dog.

L1 L2 D' r2
marker01 marker02 1.000 1.000
marker01 marker03 1.000 0.342
marker01 marker04 0.583 0.223
marker01 marker05 0.621 0.300
marker02 marker03 1.000 0.342
marker02 marker04 0.583 0.223
marker02 marker05 0.621 0.300
marker03 marker04 0.736 0.282
marker03 marker05 1.000 0.439
marker04 marker05 1.000 0.844

D' and r2 values for dog.

In the corresponding LD plot, the haplotype blocks determined by the Four Gamete Rule are indicated by black lines.

Image:LD plot dog.png

LD plot of dog.

Discussion

Linkage disequilibrium measures. Table data and LD plots agree: there is no detectable linkage disequilibrium among the sequenced flycatchers, but plenty among the dogs.

Demographic measures. According to the lab introduction, the flycatcher population is distributed over large parts of northeastern Europe, with plenty of gene interchange. No recent selective sweep or bottleneck is known to have occurred in the species, accounting for the lack of signs of LD. In contrast, the modern dog has gone through two recent population bottlenecks, one during domestication, and one during breeding. The observed linkage disequilibrium is therefore to be expected.

Why two blocks in dog? If linkage disequilibrium is present in dog, why do we observe two haplotype blocks and not just one? One possible explanation is that there is a mutational hotspot between the two blocks.