Recombination Rate Inference
Introduction
During the period from 2008 to 2026 I authored and coauthored 8 papers on the topics of inferring recombination rates and inferring phylogeny in the presence of recombination using population genomic data. Recombination is a fundamental process during meiosis in sexually reproducing organisms, generating genetic variation and shaping patterns of linkage disequilibrium (LD) across the genome. Recombination rates vary dramatically across the genome, with much of the recombination concentrated in narrow regions known as recombination hotspots. Understanding recombination rate variation is important both for its intrinsic biological interest and because recombination rates influence the design and interpretation of genome-wide association studies and other population genomic analyses.
My work on recombination rate inference began as a collaboration with Ying Wang, then a PhD student in my group. We developed Bayesian methods using the coalescent to estimate fine-scale recombination rates and identify hotspots from population genomic data [1,2]. This work was distinctive in using a Bayesian full-likelihood Markov chain Monte Carlo (MCMC) approach rather than marginal likelihood or composite-likelihood approximations that were common at the time. During crossing over, genetic material may also be exchanged by a distinct mechanism known as gene conversion, which involves short tracts of DNA. Badri Padhukasahasram, a postdoc in my lab, extended our methods to jointly estimate rates of crossing over and gene conversion [3,4]. Ying Wang later developed methods for identifying shared recombination hotspots between species [5], which we applied in a comparative study of human and chimpanzee genomes. The computational expense of these methods is high, limiting the number of SNPs that can be analyzed across a genome and requiring long MCMC runs. A collaboration with visiting student Chi Zhang in my lab and student Hasan Alhaddad in Leslie Lyons lab characterize recombination hotspots in the domestic cat [6]. More recently, I have been interested in the effects of recombination on phylogenetic inference [7], and together with Yuttapong Thawornwattana and Ziheng Yang we have examined the robustness of Bayesian gene flow inference to intragenic recombination [8].
The papers
[1] Y. Wang and B. Rannala. 2008. Bayesian inference of fine-scale recombination rates using population genomic data. Philosophical Transactions of the Royal Society B 363: 3921-3930. Download
This paper developed a full-likelihood Bayesian MCMC method for estimating fine-scale recombination rates from population samples. Genealogies underlying a sample of chromosomes are modeled using marginal individual SNP genealogies related through an ancestral recombination graph. We compared the method with two existing composite-likelihood methods using simulated data and applied it to two human population datasets for which sperm-typing results were available. Our results were consistent with the estimates from sperm crossover analysis, validating the full-likelihood approach.
[2] Y. Wang and B. Rannala. 2009. Population genomic inference of recombination rates and hotspots. Proceedings of the National Academy of Sciences 106: 6215-6219. Download
This paper extended our Bayesian full-likelihood method to estimate background recombination rates and identify hotspots on a genome-wide scale. The probability model was inspired by observed patterns of recombination from sperm-typing studies. For larger regions, the method splits the data into subintervals and uses a parallel computing approach with shared parameters across subintervals. Posterior probabilities and Bayes factors of recombination hotspots along chromosomes are inferred. Simulations showed the method can accurately estimate recombination rate variation across genomic regions and can distinguish clusters of hotspots even when weaker hotspots are present. We applied the method to SNP data from the HLA region, MS32, and chromosome 19.
[3] B. Padhukasahasram and B. Rannala. 2011. Bayesian population genomic inference of crossing over and gene conversion. Genetics 189: 607-619. Download
This paper developed a coalescent-based full-likelihood MCMC method for jointly estimating crossing-over rates, gene-conversion rates, and mean tract lengths from population genomic data under a Bayesian framework. We verified correctness using simulations and extended the method to models with variable gene-conversion and crossing-over rates, demonstrating its ability to identify recombination hotspots. Application to sequences from the telomeric regions of the X chromosome of Drosophila melanogaster indicated that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences, with mean tract lengths estimated at approximately 70 bp and 430 bp, respectively.
[4] B. Padhukasahasram and B. Rannala. 2013. Meiotic gene-conversion rate and tract length variation in the human genome. European Journal of Human Genetics 21: 1-8. Download
This paper extended the methods for estimating gene conversion to the human genome, providing estimates of meiotic gene-conversion rates and tract length variation across different genomic regions. The analysis contributed to our understanding of the relative contributions of crossing over and gene conversion to genetic variation in human populations.
[5] Y. Wang and B. Rannala. 2014. Bayesian inference of shared recombination hotspots between humans and chimpanzees. Genetics 198: 1621-1628. Download
This paper developed a Bayesian method for testing whether recombination hotspots are shared between related species. Previous studies had suggested that the locations of recombination hotspots are not conserved between humans and chimpanzees. We reanalyzed data from resequencing projects and computed Bayes factors for shared hotspots between human and chimpanzee across aligned regions of chromosome 21. Interestingly, although previous comparative studies of the beta-globin and HLA regions did not find overlapping hotspots, our results showed high Bayes factors for shared hotspots at locations within both regions, with estimated locations overlapping those obtained from sperm-typing studies in humans.
[6] H. Alhaddad, C. Zhang, B. Rannala, and L.A. Lyons. 2016. A glance at recombination hotspots in the domestic cat. PLoS ONE 11: e0148710. Download
This paper applied our Bayesian recombination inference methods (implemented in the program inferRho) to characterize recombination rate variation and hotspots in the domestic cat, Felis silvestris catus. We analyzed SNP data from twenty-two East Asian feral cats across ten chromosomal regions. Four decisive recombination hotspots were identified on cat chromosomes A2, D1, and E2. No correlation was detected between GC content and hotspot location; the hotspots enclosed L2 LINE elements and MIR and tRNA-Lys SINE elements.
[7] B. Rannala. 2025. Recombination and phylogenetic inference. Evolutionary Journal of the Linnean Society 4: kzaf016. Download
This paper examines the effects of recombination on phylogenetic tree inference. Two primary methodologies are compared: concatenation approaches that treat all loci as sharing one gene tree, and species tree methods that assume each locus has its own gene tree with no intralocus recombination. Three strategies for managing recombination effects are evaluated: assessing statistical robustness when recombination goes unaccounted for, identifying and eliminating recombinant regions, and employing methods that accommodate varying gene trees across sequences. The analysis shows that recombination is likely to be more detrimental for concatenation methods and has relatively little impact on topology or divergence time estimates for species tree methods. Recombination detection may prove unnecessary for species tree analysis, and removing recombinant loci could actually introduce bias.
[8] Y. Thawornwattana, B. Rannala, and Z. Yang. 2026. On the robustness of Bayesian inference of gene flow to intragenic recombination and natural selection. Molecular Biology and Evolution 43: msaf327. Download
This paper uses simulation to examine false positive rates in a Bayesian test of gene flow under the multispecies coalescent model, considering factors including recombination, natural selection, species divergence timing, and whether gene flow involves sister or non-sister lineages. The test has very low false positive rates in most scenarios. However, gene flow detection between sister lineages may be prone to high false positives in cases of very recent species divergence combined with very high recombination rates. The test demonstrates robustness to various types of selection at low recombination rates, though prolonged balancing selection can produce false gene flow signals between sister lineages. Gene flow detection between non-sister lineages remains robust across all recombination and divergence levels.