Preprint: retaining close relatives in biobanks improves estimates of recent effective population size
A benchmarking study challenges the standard practice of removing related individuals before demographic inference, showing that close relatives carry useful signal about recent population history.
The conventional approach in population genetics is to exclude closely related individuals from datasets before performing demographic inference, on the grounds that their shared ancestry violates the assumption of independence. A preprint on bioRxiv challenges the cost of this practice, specifically for the estimation of recent effective population size (Ne).
The authors benchmarked two widely used methods — IBDNe and HapNe-IBD — which infer recent Ne from identity-by-descent (IBD) segment sharing. As biobank-scale datasets now routinely contain hundreds of thousands to millions of participants, close relatives are increasingly common; removing them discards a meaningful fraction of the data and, the analysis suggests, may distort inferences about the most recent generations of a population's history precisely because close relatives are enriched for recent IBD.
The study has implications for how population geneticists and statistical geneticists approach quality-control pipelines in large cohorts. It also connects to methodological considerations relevant to pedigree-aware analyses, where relative pairs are a feature rather than a nuisance. This work is a preprint and has not yet been peer-reviewed.
Sources
Read the original reporting — these are the public sources this summary draws from.
-
Primary sourcePreprint bioRxiv (Cold Spring Harbor Laboratory) · 2026-06-19From Nuisance to Signal: Leveraging Close Relatives in Biobank-Scale Demographic Inference