Subtopic · Quantitative genetics

Polygenic models: additive effects, non-additive effects, and polygenic risk scores

A polygenic model treats a phenotype as the sum of small contributions from many independent loci. The framework was set out by Fisher 1918 and is the foundation of the modern polygenic risk score. This page covers the additive (a) and dominance (d) effects, narrow- vs broad-sense heritability, the construction of PRS by LDpred / P+T / PRS-CS, the cap on PRS performance set by trait heritability, and the portability problem documented across ancestries. Educational treatment; outputs of any PRS computation in this framework are illustrative for research and education, not clinical.

| 10 min read

Short version. A polygenic risk score (PRS) is the genome-wide sum of risk-allele dosages weighted by GWAS effect sizes. It is the operational descendant of Fisher's 1918 infinitesimal model. Its theoretical accuracy is capped by the trait's narrow-sense heritability, and its empirical accuracy degrades sharply when applied across the ancestry boundary the discovery cohort used. None of the framing on this page is clinical — PRS results in research and education are summary statistics, not medical recommendations.

Fisher's infinitesimal model

Fisher's Transactions of the Royal Society of Edinburgh 52:399 paper of 1918 set out a model in which a continuous phenotype is the sum of the genetic contributions from a (notionally infinite) collection of independently segregating Mendelian loci, each with a small additive effect on the trait, plus an environmental term. The central limit theorem then guarantees that the population distribution of phenotypes is approximately Gaussian, that the population variance decomposes into VP = VA + VD + VI + VE (additive, dominance, epistatic, environmental), and that the parent-offspring correlation under additive variance equals 0.5, the full-sib correlation 0.5, the half-sib correlation 0.25, and the monozygotic-twin correlation 1.0 in expectation.

The infinitesimal label reflects the limit of large locus number with small per-locus effect. Real genetic architectures fall on a continuum: at one end, single-locus Mendelian disorders; at the other, height (estimated to involve thousands of common variants) and behavioural traits whose architecture is even more diffuse. The polygenic synthesis does not require all traits to lie at the infinitesimal limit, only that the sum of small contributions across many loci is the right framing for the trait of interest. Lynch & Walsh (1998) Genetics and Analysis of Quantitative Traits develops the underlying mathematics in textbook form and remains the canonical reference for the variance-decomposition machinery.

Additive (a), dominance (d), and epistatic effects

For a single biallelic locus with alleles A1 and A2, three genotypic values are defined: the homozygote A1A1 phenotype is +a, the heterozygote A1A2 phenotype is +d, and the homozygote A2A2 phenotype is −a, all measured from the midpoint between the two homozygotes. The additive effect a is the half-difference between homozygote phenotypes; the dominance deviation d is the deviation of the heterozygote from the homozygote midpoint. When d = 0 the locus is purely additive; d = a is full dominance of A1; d = −a is full dominance of A2; d > a is overdominance.

Across loci, two further sources of non-additivity matter. Epistasis is the dependence of one locus's effect on the genotype at another locus — the genetic equivalent of an interaction term in regression. Linkage creates non-independence of allele transmission within a chromosomal region; under linkage disequilibrium, the variance contributions of nearby loci cannot be cleanly separated. The variance components are conventionally written VA (additive), VD (dominance), VI (epistatic interaction), and the broad-sense genetic variance VG = VA + VD + VI.

Narrow-sense vs broad-sense heritability

Two heritability statistics matter. The narrow-sense heritability h2 = VA / VP is the proportion of phenotypic variance attributable to additive genetic variance, and is the quantity that determines parent-offspring resemblance and the response of a trait to selection. The broad-sense heritability H2 = VG / VP is the proportion attributable to all genetic variance and is the quantity recovered (in principle) from monozygotic-twin concordance. h2 ≤ H2 by definition.

The distinction matters operationally for polygenic risk scoring: the theoretical ceiling on a PRS's predictive R2 is the narrow-sense heritability h2, because GWAS effect-size estimates are additive coefficients fitted in a linear model. A trait with h2 = 0.4 has a theoretical PRS R2 ceiling of 0.4 in the discovery population; in practice PRS achieves a fraction of this because of finite GWAS sample size, allele-frequency shrinkage, and LD imperfections.

Polygenic risk scores: definition and construction

A polygenic risk score for an individual is the sum across SNPs of the dose of the risk allele (0, 1, or 2 copies) weighted by the SNP's effect size estimate from a GWAS:

PRSi = ∑j βj · Xij

where βj is the effect size at SNP j (typically the log-odds-ratio for a binary trait or the per-allele effect for a continuous trait), and Xij ∈ {0, 1, 2} is the dosage of the risk allele in individual i. The score is a scalar number that locates the individual on a population distribution of polygenic burden. Three families of construction methods dominate the literature.

  • Clumping and thresholding (P+T, also called pruning and thresholding). Select SNPs by GWAS p-value at a chosen threshold; LD-clump to remove correlated SNPs; sum the effect sizes of the clumped, threshold-passing variants. Simple, transparent, and robust; less efficient at extracting signal than the Bayesian methods.
  • LDpred. Vilhjálmsson et al. 2015 introduced LDpred, a Bayesian shrinkage method that updates each SNP's effect size by a posterior under a point-normal mixture prior, conditioned on an LD reference panel. LDpred-funct, LDpred2, and related variants extend the method.
  • PRS-CS. Ge et al. 2019 introduced PRS-CS, which uses a continuous-shrinkage prior on effect sizes and is empirically robust across architectures. Methodological detail in Choi, Mak & O'Reilly 2020 (Nat Protoc 15:2759), which is a good practical guide for both LDpred and PRS-CS.

The PGS Catalog (Lambert et al. 2021, Nat Genet 53:420; PMID 33692568) is the central public registry of published polygenic scores, providing standardised metadata for each score's discovery population, training method, and validation cohort. As of late 2025 it indexes more than four thousand scores across hundreds of phenotypes.

UK Biobank and the canonical demonstrations

The UK Biobank's ~500,000 genotyped, deeply phenotyped participants made large-scale PRS demonstrations practical. Khera et al. 2018 (Nat Genet 50:1219; PMID 30104762) is the canonical paper, demonstrating that a coronary-artery-disease PRS computed on UK Biobank data identifies individuals at the top 8% of the polygenic distribution with risk approximately threefold the population mean — comparable in magnitude to that of monogenic familial-hypercholesterolaemia carriers. The same paper extended the analysis to atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer.

Subsequent work has refined and qualified these results. PRS performance for any single trait is bounded by the heritability captured by the GWAS, and improvements with additional GWAS sample size are sublinear. PRS results in any setting are summary statistics, not recommendations; on this site and in the Evagene help catalogue PRS framing is research and education, never clinical.

The portability problem

The single most important caveat in the PRS literature is the cross-ancestry portability problem. Martin et al. 2019 (Nat Genet 51:584; PMID 30940768) showed that PRS computed using effect sizes estimated in one ancestry group lose substantial predictive accuracy — often more than half — when applied to a different ancestry group. The causes are well-understood: differences in allele frequencies, differences in LD patterns (so a tag-SNP's effect estimate no longer reflects the causal variant equally), and differences in the population-specific genetic architecture of the trait.

Because the GWAS literature is overwhelmingly European-ancestry, this means PRS as currently published transfer poorly to non-European populations. Martin and colleagues argued that uncritical deployment of these scores risks exacerbating health disparities. The correct response is investment in non-European GWAS — under way at scale through the All of Us Research Program, the Million Veteran Program, and large East Asian and African biobanks — and in cross-ancestry methods such as PRS-CSx and Bayesian multi-ancestry meta-analysis.

Performance metrics: AUC and R2

For a binary disease phenotype, PRS performance is most often reported as AUC (area under the ROC curve), the probability that a randomly chosen case has a higher PRS than a randomly chosen control. AUC of 0.5 is no better than chance; 1.0 is perfect classification. PRS for highly heritable common diseases in well-powered European GWAS reach AUC in the range 0.65–0.80 in the discovery population. AUC is bounded above by the trait's heritability and the prevalence; the precise ceiling is given by Wray, Yang and colleagues' work on PRS upper limits.

For continuous traits, PRS performance is reported as R2 (proportion of phenotypic variance explained). An R2 ceiling equal to the narrow-sense heritability is a useful intuition, though the empirical ceiling depends on GWAS sample size, the LD pattern in the validation cohort, and the construction method.

Mendelian vs polygenic inheritance — a plain comparison

The Mendelian inheritance framework treats a phenotype as the consequence of one or two pathogenic alleles at a single locus, with segregation ratios determined by the inheritance pattern (50% in autosomal dominant, 25% in autosomal recessive, etc.). Pedigrees with this architecture show clean segregation. A polygenic / multifactorial framework treats the same kind of binary phenotype as the consequence of an underlying continuous liability summed across many loci plus environment; pedigrees show familial clustering without single-locus segregation ratios.

The two frameworks coexist for many real diseases. Breast cancer is a useful illustration: BRCA1 and BRCA2 pathogenic variants are Mendelian (autosomal dominant with reduced penetrance); the residual familial clustering not explained by these is well-described by a polygenic component. Modern integrated models such as BOADICEA combine both. (BOADICEA is licensed by the University of Cambridge and is not bundled in Evagene; the platform exports a CanRisk-formatted pedigree file that the user uploads at canrisk.org for off-platform computation.) For complex diseases without a Mendelian backbone — type 2 diabetes, schizophrenia, hypertension — the polygenic / multifactorial framework is the only one that fits, and Evagene's complex-disease pedigree software implements the Falconer / Carter version of that framework with research- and education-grade framing.

Limits, framing, and what PRS does not do

A PRS is a population-level summary statistic. It locates an individual on a polygenic-burden distribution that has been derived from the discovery cohort. Several limits are widely acknowledged in the literature:

  • The heritability ceiling. No PRS can predict more variance than h2; current PRS reach a fraction of that ceiling.
  • The portability problem (Martin et al. 2019) — cross-ancestry transfer loses substantial accuracy.
  • Cohort-specific calibration. Quantile thresholds derived in the discovery cohort do not transfer cleanly to other populations or to other recruiting strategies.
  • No causal interpretation. A PRS includes effects of variants in linkage with causal variants; it does not isolate causal biology.
  • No environmental information. Environment, lifestyle, and clinical context are separate from PRS by construction; the polygenic framework explicitly models VG and VE as additive, not interacting (the GxE page covers what happens when this assumption breaks).

The framing on this page, on the rest of the Evagene site, and in the help catalogue treats PRS as a research and education construct. PRS outputs in any of these contexts are illustrative, not recommendations. Decisions about screening, testing, referral, or treatment are matters of professional clinical judgement and not the output of any tool on this platform.

Canonical references

Related reading

Use Evagene for teaching and research on polygenic traits

In-browser pedigree drawing with NSGC notation, a research- and education-grade liability-threshold engine, and a 1,900-entry help catalogue. Free during alpha for clinicians, researchers, educators, and students. Educational use only; not a medical device.

Join the Alpha Waiting List