Localizing the Causal Mutation in Selective Sweeps in the Human Genome

07/17/2009 13:30
07/17/2009 14:30
America/New York

Pardis Sabeti, Assistant Professor at Harvard University in the Center for Systems Biology and the Department of Organismic and Evolutionary Biology will present a seminar at MSSM on July 17, 2009 at 1:30 PM.
Click here for Sabeti Lab

Location: Annenberg 20-01

Localizing the Causal Mutation in Selective Sweeps in the Human Genome
Recent whole-genome surveys for natural selection in humans have identified large genomic regions, typically megabases in size. In previous work, we demonstrated that a heuristic approach combining different measures of selection could narrow these regions to just a few candidate variants (Sabeti et al. 2007). We have now formalized this approach into a composite likelihood score (Nielsen et al. 2005) that can rapidly scan the whole genome and localize these signals of selection to narrow regions, often containing only a handful of variants. Using simulated full sequence data for 1500 1Mb regions with ~10,000 polymorphisms each, we show the top scoring marker is the causal variant scores 1/3 of the time, and among the top 20 more than 85% of time, given the selected allele has a frequency higher than 30%. We also applied our test to haplotype map data for three selected regions in which the causal variant is known, SLC24A5, EDAR, and LCT. Each region contained over 1000 polymorphisms. In all three cases, the causal variant is among the ten top hits, and for SLC24A5 and EDAR, the causal variants are the only top-scoring variant with evidence of function.
Applying our method to the whole genome, we have found several intriguing and novel candidates for selection. As examples, we localized a signal of selection in Asian populations at 10q21.1 to PCDH15, a cadherin involved in the development of inner ear hair cells that causes Usher’s syndrome, a leading cause of deafness. The second and third highest scoring variants in this region correspond to a mutation in transcription-factor binding site in the promoter and a non-synonymous mutation in a cadherin domain, respectively. Another signal in Asians localizes to ADAT1, which is involved in pre-mRNA editing of nuclear transcripts though site-specific adenosine modification. It is located in a cluster of three tRNA-specific genes, thought to be co-evolving together. We are currently following up these and other top candidates on a case-by-case basis to elucidate the potential functional basis of adaptation.
Given the remarkable power to detect the causal variant with high specificity confirmed by our simulations, we predict that by mining whole genome resequencing data it will be possible to identify the precise nucleotide changes that are responsible for hundreds of selective sweeps.