Polygenic Models — Additive Effects, PRS, LDpred, PRS-CS

Short version. A polygenic risk score (PRS) is the genome-wide sum of risk-allele dosages weighted by GWAS effect sizes. It is the operational descendant of Fisher's 1918 infinitesimal model. Its theoretical accuracy is capped by the trait's narrow-sense heritability, and its empirical accuracy degrades sharply when applied across the ancestry boundary the discovery cohort used. None of the framing on this page is clinical — PRS results in research and education are summary statistics, not medical recommendations.

Fisher's infinitesimal model

Fisher's Transactions of the Royal Society of Edinburgh 52:399 paper of 1918 set out a model in which a continuous phenotype is the sum of the genetic contributions from a (notionally infinite) collection of independently segregating Mendelian loci, each with a small additive effect on the trait, plus an environmental term. The central limit theorem then guarantees that the population distribution of phenotypes is approximately Gaussian, that the population variance decomposes into V_P = V_A + V_D + V_I + V_E (additive, dominance, epistatic, environmental), and that the parent-offspring correlation under additive variance equals 0.5, the full-sib correlation 0.5, the half-sib correlation 0.25, and the monozygotic-twin correlation 1.0 in expectation.

The infinitesimal label reflects the limit of large locus number with small per-locus effect. Real genetic architectures fall on a continuum: at one end, single-locus Mendelian disorders; at the other, height (estimated to involve thousands of common variants) and behavioural traits whose architecture is even more diffuse. The polygenic synthesis does not require all traits to lie at the infinitesimal limit, only that the sum of small contributions across many loci is the right framing for the trait of interest. Lynch & Walsh (1998) Genetics and Analysis of Quantitative Traits develops the underlying mathematics in textbook form and remains the canonical reference for the variance-decomposition machinery.

Additive (a), dominance (d), and epistatic effects

For a single biallelic locus with alleles A₁ and A₂, three genotypic values are defined: the homozygote A₁A₁ phenotype is +a, the heterozygote A₁A₂ phenotype is +d, and the homozygote A₂A₂ phenotype is −a, all measured from the midpoint between the two homozygotes. The additive effect a is the half-difference between homozygote phenotypes; the dominance deviation d is the deviation of the heterozygote from the homozygote midpoint. When d = 0 the locus is purely additive; d = a is full dominance of A₁; d = −a is full dominance of A₂; d > a is overdominance.

Across loci, two further sources of non-additivity matter. Epistasis is the dependence of one locus's effect on the genotype at another locus — the genetic equivalent of an interaction term in regression. Linkage creates non-independence of allele transmission within a chromosomal region; under linkage disequilibrium, the variance contributions of nearby loci cannot be cleanly separated. The variance components are conventionally written V_A (additive), V_D (dominance), V_I (epistatic interaction), and the broad-sense genetic variance V_G = V_A + V_D + V_I.

Narrow-sense vs broad-sense heritability

Two heritability statistics matter. The narrow-sense heritability h² = V_A / V_P is the proportion of phenotypic variance attributable to additive genetic variance, and is the quantity that determines parent-offspring resemblance and the response of a trait to selection. The broad-sense heritability H² = V_G / V_P is the proportion attributable to all genetic variance and is the quantity recovered (in principle) from monozygotic-twin concordance. h² ≤ H² by definition.

The distinction matters operationally for polygenic risk scoring: the theoretical ceiling on a PRS's predictive R² is the narrow-sense heritability h², because GWAS effect-size estimates are additive coefficients fitted in a linear model. A trait with h² = 0.4 has a theoretical PRS R² ceiling of 0.4 in the discovery population; in practice PRS achieves a fraction of this because of finite GWAS sample size, allele-frequency shrinkage, and LD imperfections.

Polygenic risk scores: definition and construction

A polygenic risk score for an individual is the sum across SNPs of the dose of the risk allele (0, 1, or 2 copies) weighted by the SNP's effect size estimate from a GWAS:

PRS_i = ∑_j β_j · X_ij

where β_j is the effect size at SNP j (typically the log-odds-ratio for a binary trait or the per-allele effect for a continuous trait), and X_ij ∈ {0, 1, 2} is the dosage of the risk allele in individual i. The score is a scalar number that locates the individual on a population distribution of polygenic burden. Three families of construction methods dominate the literature.

Clumping and thresholding (P+T, also called pruning and thresholding). Select SNPs by GWAS p-value at a chosen threshold; LD-clump to remove correlated SNPs; sum the effect sizes of the clumped, threshold-passing variants. Simple, transparent, and robust; less efficient at extracting signal than the Bayesian methods.
LDpred. Vilhjálmsson et al. 2015 introduced LDpred, a Bayesian shrinkage method that updates each SNP's effect size by a posterior under a point-normal mixture prior, conditioned on an LD reference panel. LDpred-funct, LDpred2, and related variants extend the method.
PRS-CS. Ge et al. 2019 introduced PRS-CS, which uses a continuous-shrinkage prior on effect sizes and is empirically robust across architectures. Methodological detail in Choi, Mak & O'Reilly 2020 (Nat Protoc 15:2759), which is a good practical guide for both LDpred and PRS-CS.

The PGS Catalog (Lambert et al. 2021, Nat Genet 53:420; PMID 33692568) is the central public registry of published polygenic scores, providing standardised metadata for each score's discovery population, training method, and validation cohort. As of late 2025 it indexes more than four thousand scores across hundreds of phenotypes.

UK Biobank and the canonical demonstrations

The UK Biobank's ~500,000 genotyped, deeply phenotyped participants made large-scale PRS demonstrations practical. Khera et al. 2018 (Nat Genet 50:1219; PMID 30104762) is the canonical paper, demonstrating that a coronary-artery-disease PRS computed on UK Biobank data identifies individuals at the top 8% of the polygenic distribution with risk approximately threefold the population mean — comparable in magnitude to that of monogenic familial-hypercholesterolaemia carriers. The same paper extended the analysis to atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer.

Subsequent work has refined and qualified these results. PRS performance for any single trait is bounded by the heritability captured by the GWAS, and improvements with additional GWAS sample size are sublinear. PRS results in any setting are summary statistics, not recommendations; on this site and in the Evagene help catalogue PRS framing is research and education, never clinical.

The portability problem

The single most important caveat in the PRS literature is the cross-ancestry portability problem. Martin et al. 2019 (Nat Genet 51:584; PMID 30940768) showed that PRS computed using effect sizes estimated in one ancestry group lose substantial predictive accuracy — often more than half — when applied to a different ancestry group. The causes are well-understood: differences in allele frequencies, differences in LD patterns (so a tag-SNP's effect estimate no longer reflects the causal variant equally), and differences in the population-specific genetic architecture of the trait.

Because the GWAS literature is overwhelmingly European-ancestry, this means PRS as currently published transfer poorly to non-European populations. Martin and colleagues argued that uncritical deployment of these scores risks exacerbating health disparities. The correct response is investment in non-European GWAS — under way at scale through the All of Us Research Program, the Million Veteran Program, and large East Asian and African biobanks — and in cross-ancestry methods such as PRS-CSx and Bayesian multi-ancestry meta-analysis.

Performance metrics: AUC and R²

For a binary disease phenotype, PRS performance is most often reported as AUC (area under the ROC curve), the probability that a randomly chosen case has a higher PRS than a randomly chosen control. AUC of 0.5 is no better than chance; 1.0 is perfect classification. PRS for highly heritable common diseases in well-powered European GWAS reach AUC in the range 0.65–0.80 in the discovery population. AUC is bounded above by the trait's heritability and the prevalence; the precise ceiling is given by Wray, Yang and colleagues' work on PRS upper limits.

For continuous traits, PRS performance is reported as R² (proportion of phenotypic variance explained). An R² ceiling equal to the narrow-sense heritability is a useful intuition, though the empirical ceiling depends on GWAS sample size, the LD pattern in the validation cohort, and the construction method.

Mendelian vs polygenic inheritance — a plain comparison

The Mendelian inheritance framework treats a phenotype as the consequence of one or two pathogenic alleles at a single locus, with segregation ratios determined by the inheritance pattern (50% in autosomal dominant, 25% in autosomal recessive, etc.). Pedigrees with this architecture show clean segregation. A polygenic / multifactorial framework treats the same kind of binary phenotype as the consequence of an underlying continuous liability summed across many loci plus environment; pedigrees show familial clustering without single-locus segregation ratios.

The two frameworks coexist for many real diseases. Breast cancer is a useful illustration: BRCA1 and BRCA2 pathogenic variants are Mendelian (autosomal dominant with reduced penetrance); the residual familial clustering not explained by these is well-described by a polygenic component. Modern integrated models such as BOADICEA combine both. (BOADICEA is licensed by the University of Cambridge and is not bundled in Evagene; the platform exports a CanRisk-formatted pedigree file that the user uploads at canrisk.org for off-platform computation.) For complex diseases without a Mendelian backbone — type 2 diabetes, schizophrenia, hypertension — the polygenic / multifactorial framework is the only one that fits, and Evagene's complex-disease pedigree software implements the Falconer / Carter version of that framework with research- and education-grade framing.

Limits, framing, and what PRS does not do

A PRS is a population-level summary statistic. It locates an individual on a polygenic-burden distribution that has been derived from the discovery cohort. Several limits are widely acknowledged in the literature:

The heritability ceiling. No PRS can predict more variance than h²; current PRS reach a fraction of that ceiling.
The portability problem (Martin et al. 2019) — cross-ancestry transfer loses substantial accuracy.
Cohort-specific calibration. Quantile thresholds derived in the discovery cohort do not transfer cleanly to other populations or to other recruiting strategies.
No causal interpretation. A PRS includes effects of variants in linkage with causal variants; it does not isolate causal biology.
No environmental information. Environment, lifestyle, and clinical context are separate from PRS by construction; the polygenic framework explicitly models V_G and V_E as additive, not interacting (the GxE page covers what happens when this assumption breaks).

The framing on this page, on the rest of the Evagene site, and in the help catalogue treats PRS as a research and education construct. PRS outputs in any of these contexts are illustrative, not recommendations. Decisions about screening, testing, referral, or treatment are matters of professional clinical judgement and not the output of any tool on this platform.

Polygenic models: additive effects, non-additive effects, and polygenic risk scores

Fisher's infinitesimal model

Additive (a), dominance (d), and epistatic effects

Narrow-sense vs broad-sense heritability

Polygenic risk scores: definition and construction

UK Biobank and the canonical demonstrations

The portability problem

Performance metrics: AUC and R²

Mendelian vs polygenic inheritance — a plain comparison

Limits, framing, and what PRS does not do

Canonical references

Related reading

Use Evagene for teaching and research on polygenic traits

Fisher's infinitesimal model

Additive (a), dominance (d), and epistatic effects

Narrow-sense vs broad-sense heritability

Polygenic risk scores: definition and construction

UK Biobank and the canonical demonstrations

The portability problem

Performance metrics: AUC and R2

Mendelian vs polygenic inheritance — a plain comparison

Limits, framing, and what PRS does not do

Canonical references

Related reading

Use Evagene for teaching and research on polygenic traits

Performance metrics: AUC and R²