New statistical framework corrects for recurrent mutation in large-scale allele frequency analysis
A bioRxiv preprint from Cold Spring Harbor Laboratory introduces the single mutation frequency spectrum, a revised approach to analysing rare allele data that accounts for identical-by-state variants arising from recurrent mutation events.
As whole-genome and whole-exome sequencing datasets grow to encompass hundreds of thousands of samples, they routinely surface alleles at frequencies too low to have been captured in earlier studies. A preprint posted on bioRxiv proposes that conventional site frequency spectrum (SFS) analysis breaks down at these very low frequencies because some rare alleles that appear identical are not descended from a shared ancestral mutation — they arose independently at the same site, a phenomenon known as recurrent mutation.
To address this, the authors define the single mutation frequency spectrum (SMFS), which restricts analysis to alleles that are identical by descent from a single mutational event rather than identical merely by state. Because the standard SFS is strongly dependent on underlying mutation rates when recurrent mutations are present, the authors argue that using the SFS without correction introduces systematic biases into downstream demographic inference and tests of selection.
The work is a methods contribution aimed at population geneticists who work with large biobank-scale datasets or ultra-rare variant catalogues. The preprint has not yet undergone peer review. If the framework is validated and adopted, it could improve the accuracy of demographic modelling, mutation rate estimation, and tests for natural selection applied to the rare end of the allele frequency spectrum.
Sources
Read the original reporting — these are the public sources this summary draws from.
-
Primary sourcePreprint bioRxiv (Cold Spring Harbor Laboratory) · 2026-05-31Accounting for recurrent mutation in the frequency spectrum of rare alleles