Preprint quantifies real-world risk of genome re-identification using polygenic phenotype prediction
Researchers developed a probabilistic framework to assess how accurately polygenic predictions of observable traits could be used to re-identify an anonymised genome, finding the practical risk lower than some prior estimates suggested.
A preprint posted to bioRxiv on 10 June 2026 addresses a longstanding question in genomic data governance: to what extent can an anonymised genome be re-identified by comparing polygenic predictions of a person's observable traits against their known characteristics? This class of attack — phenotypic tracing — has been discussed extensively in the genomic privacy literature, but prior studies have been criticised for overstating its practical feasibility.
The authors developed a probabilistic framework to quantify re-identification risk under realistic conditions, drawing on improvements in polygenic score accuracy made possible by increasingly large GWAS cohorts. Their analysis attempts to provide a calibrated assessment of the threat landscape rather than a worst-case theoretical bound, which has important implications for how genomic data repositories design consent frameworks, data access tiers, and de-identification standards.
The findings are relevant to researchers working in biobank governance, data-sharing policy, and genomic informatics, as well as to institutions responsible for designing ethical frameworks for large-scale genomic datasets. The preprint has not completed peer review. No personal genomic data from identifiable individuals was used in the publicly described analysis.
Sources
Read the original reporting — these are the public sources this summary draws from.
-
Primary sourcePreprint bioRxiv (Cold Spring Harbor Laboratory) · 2026-06-10Evaluating anonymized genome re-identification using polygenic predictions and its implications for data privacy