Molecular Evolution:From wolverine faeces to individual genotypes

From Carls wiki

Jump to: navigation, search

Contents

Introduction

By using microsatellite genotyping across 11 loci in the wolverine genome, we have attempted to identify the number of distinct individuals from 27 piles of wolverine faeces, after first extracting DNA from the piles. We also estimated the actual size of the population, as well as analysed the family interrelationships between the individuals.

Methods

Lab part

DNA extraction. Through an 18-step process, we extracted DNA from one of the faeces samples.

PCR amplification. We then amplified our extracted DNA. We prepared a master mix, with the following ingredients:

  Stock conc. Final conc. PCR Mix 50 rx
PCR buffer 10 x, 15 mM MgCl2 1 x 1 µl 50 µl
MgCl2 25 mM 1.5 mM 0 µl 0 µl
dNTP 20 mM 0.2 mM 0.1 µl 5 µl
Primer 1 10 µM 0.32 µM 0.32 µl 16 µl
Primer 2 10 µM 0.32 µM 0.32 µl 16 µl
BSA 20 mg/ml 0.1 mg/ml 0.05 µl 2.5 µl
Hotstar 5 U/µl 0.025 U/µl 0.05 µl 2.5 µl
ddH2O 6.16 µl 380 µl

Sequencing. The samples were then centrifuged and denatured, and run through a MegaBACE™ system.

Computer part

Microsatellite scoring. With Genetic Profiler, we were able to identify the positions of the peaks from the sequenced data. All groups collected the data in a common spreadsheet.

Data analysis. The first thing we did with our data was to eliminate duplicate samples from the same individual, using The Excel Microsatellite Toolkit, complemented with manual inspection.

Population size estimation. After that, we made use of a web service, Specrich, to estimate the actual population size from the number of samples we had observed.

Parentage analysis. Lastly, we made a comparison of this year's wolverines with four individuals from earlier seasons (again with Microsatellite Toolkit) to see if they were still present, and then (using Cervus) to establish parentage relations between the old individuals and the new ones.

Results

Data analysis

The following samples turned out to be duplicates and hence most likely from the same individual:

Duplicates
1 13
10 14, 15, 23, 25, 26, 27
17 20
5 6, 18

Population size estimation

Since the input numbers for the Specrich form are so easily reproducible from the duplication results, I didn't write them down. After that, I've tried several sets of numbers, and I always seem to get a much higher population number than we did when we ran the form during the lab. (The number we got then was 33.)

Additionally, due to the fact that only slightly different inputs yield wildly different outputs, I've more or less lost faith to Specrich as a reliable population estimator.

Here's (one example of) the results:

K N(JK) SE(N(JK)) T(K) P(K)
1 42 5.4772 3.8925 0.0001
2 55 9.4868 2.7255 0.0064
3 67 13.9284 1.9447 0.0518
4 79 19.4936 1.4562 0.1453
5 92 27.2029 0.0000 1.0000
 
INTERPOLATED N 66.5209
STD ERROR OF INTERPOLATED N 13.7451

Parentage analysis

Comparing the old individuals against the current population reveals that the old wolverines are indeed still in the area:

id Parent id Sex
1 102 male
10 101 male
17 106 female
5 108 female

The Cervus data was contradictory: the strong conclusions from the output files are that individual 101 is both the father of individual 2 and the mother of individual 16. The latter is impossible, since individual 101 has been sexed as a male in earlier seasons.

Discussion

Reliability of the results. While the duplicate identifications (both within the latest year and between the latest year and those before) went well and produced reliable results, the population estimation feels unreliable and the parentage analysis contradictory. Other groups seem to have been getting varying results as well. Since we started with the same spreadsheet data, the differences are due to manual errors.

Spreadsheet editing. Preparing the spreadsheets by hand was easily the most dull and time consuming part of the lab, if not all four labs. Also, it felt like a waste of resources since it can be automated fairly straightforwardly. In particular, Windows and Excel felt like blunt tools for that kind of semi-automatic number crunching, after courses involving script programming in Linux. A simple program to run through and process the big table would have been a big help, and would have made the lab feel less tedious.