Linkage Disequilibrium Mapping

Introduction

During the period from 1998 to 2003 I coauthored 5 research papers and one review article on the topic of gene mapping using population linkage disequilibrium (LD) of linked genetic markers. The possibility of using LD for mapping genes was discussed at least as early as 1986 (Lander and Botstein, 1986). The 1990s initiated a golden-era for disease gene mapping with the advent of numerous inexpensive genetic markers and the capability to amplify and sequence gene regions without cloning by using PCR and automated Sanger sequencing. Candidate regions of chromosomes were typically identified by linkage analysis on pedigrees – however, the resolution (how precisely the location of the disease mutation can be resolved) was limited by the number of meioses (opportunities for recombination). The basic principle underlying LD mapping is that the number of recombinations in a sample of individuals from a population, with (cases) or without (controls) a specific genetic disease will be much greater than in any particular pedigree. Therefore LD analysis provides an opportunity for fine-scale mapping, particularly in more homogeneous populations, such as Finland, that may have undergone recent population bottlenecks and/or isolated expansions. A classic example was the mapping of the gene (DTDST) mutated in the dominant disorder Diastrophic Dysplasia in the Finnish population which was facilitated by strong LD in the region of the mutation (Hastbacka et al 1992).

There was a flurry of interest in the development of parametric statistical methods for LD mapping during the 1990s, including early methods using composite likelihoods, or other approximations. For example, Kaplan et al (1995), Terwilliger (1995), and Xiong and Guo (1997) pioneered these approaches. The early history is reviewed in Rannala and Slatkin (2000). The first paper that I published with Monty Slatkin (Rannala and Slatkin, 1998) on LD mapping focused on developing a maximum likelihood estimator for the disease location using a single linked marker that undergoes both mutation and recombination. Subsequent papers with postdoc Jeff Reeve allowed recombination between one or more markers and used a Bayesian approach to generate the posterior density of disease mutation location. These methods share many similarities with methods for estimating allele (mutation) age and some jointly estimate both the age of a disease mutation and the location. It should be noted that the use of LD for mapping Mendelian mutations in humans is now essentially obsolete. The availability of inexpensive whole genome sequencing has eliminated the need for a precise estimate of disease mutation location. However, LD is still useful for estimating allele (mutation) ages in populations.

The papers


[1] B. Rannala and M. Slatkin. 1998. Likelihood analysis of disequilibrium gene mapping and related problems. American Journal of Human Genetics 62: 459-473. Download

The method developed in this paper uses the same BD approximation for the intra-allelic genealogy but considers a single linked locus with recurrent mutation between only two allele types and recombination on the interval between the allele locus and the marker. A maximum likelihood estimator of recombination rate (mutation location in map units) when mutation rate and mutation age are specified is developed using Monte Carlo simulations. This method is most appropriate for markers with few alleles (low mutation rates) such as single nucleotide polymorphisms (SNPs) but sufficiently far from the allele locus that multiple recombination events occur over the intraallelic genealogy.


[2] B. Rannala and M. Slatkin. 1998. Linkage Disequilibrium Mapping and Parkinson’s Disease. Science 280: 175a. Download

A technical comment applying LD mapping methods to assess the evidence that G209A is a causative mutation of Parkinson’s disease. The LD among patients in the Greek population appeared to provide strong evidence for a causative effect in this case (Polymeropoulos et al 1997) but the causal role was disputed. The mutation (also called A53T) causes a particular form of early-onset Parkinson’s disease and was first identified in families of Greek and Italian descent – more recently it has been identified in a Korean family as well, although it is not a cause of most Parkinson’s cases worldwide.


[3] B. Rannala and J.P. Reeve. 2001. High-Resolution Multipoint Linkage-Disequilibrium Mapping in the Context of a Human Genome Sequence. American Journal of Human Genetics 69: 159-178. Download

The methods developed in this paper extend LD mapping to multiple linked markers using Bayesian Markov chain Monte Carlo (MCMC) algorithms and incorporate a Bayesian prior distribution for the disease location that may use prior information from an annotated human genome sequence.


[4] J.P. Reeve and B. Rannala. 2002. DMLE+: Bayesian linkage disequilibrium gene mapping. Bioinformatics 18: 894-895. Download

This is a note describing the DMLE+ software package for LD mapping and allele age estimation. The program is currently available at dmle.org.


[5] B. Rannala and J.P. Reeve. 2003. Joint Bayesian estimation of mutation location and age using linkage disequilibrium. Pacific Symposium on Biocomputing 8: 526-534. Download

The method developed in this paper allows allele age and mutation location to be jointly inferred, with multiple linked genetic markers surrounding an allele and undergoing recombination (but assuming no mutation). It is assumed that a linkage map of the distances between markers (in units of cM) is available but the actual position of the allele (mutation) may be unknown. A Bayesian algorithm is developed that allows the joint posterior density of the allele position and allele age to be estimated. In other words, the program provides a fine-scale map of the position as well as the age of an allele (usually the allele is associated with a phenotype, often a simple Mendelian disease). Two examples are presented using sample data for 23 restriction fragment length polymorphism (RFLP) markers spanning a total distance of 1.8 Mb from europeans carrying the most common mutation causing cystic fibrosis (CF) and analyzing a sample from the Finnish population for the mutation causing the disease diastrophic dysplasia (DTD) for 2 RFLP and 3 microsatellite markers spanning a total distance of 20 kb.

Review articles


[6] B. Rannala and M. Slatkin. 2000. Methods for Multipoint Disease Mapping Using Linkage Disequilibrium. Genetic Epidemiology 19(Suppl 1): S71-S77. Download