Population and Evolutionary Genetics

Short version. Population genetics treats a population as a sample from a probability distribution over allele frequencies and asks how that distribution evolves. The mathematical scaffolding was laid by Hardy 1908 and Weinberg 1908 (the equilibrium law), Fisher 1930 and Wright 1931 (the diffusion and drift framework), and Kimura 1968 (the neutral theory). Modern human population genetics layers genome-scale data, coalescent inference, and admixture modelling on top of that scaffolding. The field is where epidemiology of Mendelian disease, polygenic architecture, ancestry inference, and demographic history all meet.

What population genetics is, and why it sits next to clinical genetics

A clinical-genetics question is typically about an individual or a family: what is the inheritance pattern of this disease, what variant is segregating, what is the recurrence risk in this sibship. A population-genetics question is about the distribution of variants across a population: how common is this allele, how strongly is it under selection, what does its frequency tell us about the demographic history of the population that carries it. The two views inform each other. The carrier frequency of a recessive disease allele in a population — a population-genetics quantity — is what makes a recurrence-risk calculation in a particular family possible. The pattern of variants associated with a complex trait in a genome-wide association study is interpretable only against a model of how those variants segregate in the population that was sampled.

The shared substrate is the concept of an allele frequency: the proportion of chromosomes in a population that carry a given allele at a locus. Diploid genotype frequencies follow from allele frequencies under specific assumptions, and departures from those assumptions are themselves informative.

Hardy-Weinberg: the null model

The foundational result of population genetics is the Hardy-Weinberg principle, derived independently by Hardy 1908 (Science 28:49) and Weinberg 1908 (Jahresheft des Vereins für vaterländische Naturkunde in Württemberg 64:368). For a biallelic locus with allele frequencies p and q (p + q = 1), the expected genotype frequencies under random mating, no selection, no migration, no mutation, and an effectively infinite population are p² (homozygous for the major allele), 2pq (heterozygous), and q² (homozygous for the minor allele). These frequencies are reached after one generation of random mating and are stable thereafter.

Hardy-Weinberg is the null model. Its real value is that departures from it are diagnostic. Excess homozygotes can indicate population structure (the Wahlund effect), inbreeding, or genotyping artefact. Deficit of homozygotes for a recessive disease allele can indicate lethality before reproduction. Departures restricted to cases in a case-control study can flag selection or association. The chi-square test for Hardy-Weinberg equilibrium is one of the routine quality-control filters applied to any genome-wide genotyping dataset.

The four forces: mutation, selection, drift, migration

Allele frequencies change between generations under four classical forces.

Mutation introduces new alleles. The per-base, per-generation human germline mutation rate is around 1.2 × 10⁻⁸, calibrated by trio sequencing and ancient-DNA estimates. Mutation is the ultimate source of all variation but is a slow per-locus force compared with selection and drift.
Selection is differential reproductive success of genotypes. Positive selection raises the frequency of beneficial alleles; negative (purifying) selection removes deleterious alleles; balancing selection maintains polymorphism, classically through heterozygote advantage. Selection coefficients (s) measure the relative fitness disadvantage; mutation-selection balance for a recessive disease allele predicts an equilibrium frequency of approximately the square root of (mutation rate / selection coefficient).
Genetic drift is the random sampling of alleles from one generation to the next in a finite population. The strength of drift scales inversely with effective population size (Ne). Drift fixes neutral alleles with probability equal to their starting frequency. Wright 1931 (Genetics 16:97) formalised drift as a diffusion process, and the diffusion framework remains the workhorse of theoretical population genetics.
Migration moves alleles between populations and homogenises frequencies between connected demes. The classical island model and stepping-stone model frame migration in tractable form; modern human-genetics work uses richer demographic graphs.

Fisher 1930 in The Genetical Theory of Natural Selection developed the analytical framework for selection in continuous populations and articulated the fundamental theorem of natural selection. Wright 1931 introduced the diffusion approach and the concept of effective population size, and his "shifting balance" theory framed how populations might explore an adaptive landscape. The Fisher-Wright debate — selection-dominated versus drift-dominated views of evolution — runs through twentieth-century theory.

The neutral theory

The single largest reframing of the field came with Kimura 1968 (Nature 217:624), the neutral theory of molecular evolution. Kimura argued that the rate of molecular evolution at most loci is set not by positive selection but by the rate at which neutral mutations drift to fixation, and that most polymorphism segregating in a population is neutral or nearly so. The neutral theory does not deny that selection acts; it proposes that selection acts on a small fraction of variants and that the bulk of molecular variation is governed by mutation, drift, and demographic history. Ohta 1973 extended the framework to nearly neutral variation, where weak selection and drift interact in a way that depends on Ne. The neutral and nearly neutral framework is the modern null model against which selection is detected.

Three subtopics, one field

The pillar splits into three subtopic pages, each treated in depth.

Allele frequency dynamics

Hardy-Weinberg in detail; the chi-square test for HWE in case-control data; the Wahlund effect; selection (positive, negative, balancing) with selection coefficients and worked examples; mutation-selection balance for recessive disease (µ/s); the Wright-Fisher model; drift in finite populations and effective population size; the coalescent; the neutral and nearly neutral theory; diffusion theory. Worked examples include sickle-cell anaemia and the heterozygote-advantage relationship to malaria, lactase persistence as recent positive selection, and G6PD deficiency.

Demography and population structure

Wright's F-statistics (Fst, Fis, Fit) and what each measures; principal component analysis applied to genetic data; ADMIXTURE and STRUCTURE-style clustering; founder effects with worked examples (Ashkenazi Jewish founder variants in BRCA1 and BRCA2, Finnish heritage diseases, French-Canadian founders); bottlenecks and expansions; Out-of-Africa demographic history and PSMC; admixture in African-American and Latino populations; ancient DNA insights; the diversity of cohorts (UK Biobank, H3Africa, the 1000 Genomes Project).

Population genetics applications

Mendelian-disease epidemiology by population (Tay-Sachs, cystic fibrosis, sickle-cell, the thalassaemias); carrier-screening models (single-condition versus expanded panels, ethnicity-based versus pan-ethnic); pre-conception, prenatal, and newborn timing; the WHO / Wilson and Jungner 1968 criteria for screening programmes; the ELSI of population screening (consent, equity, stigma, insurance protections such as GINA in the United States); the 100,000 Genomes Project and the NHS Genomic Medicine Service. The page is framed as field-education throughout.

How population genetics shows up in pedigree work

Family-history documentation is fundamentally about a single pedigree, but population-genetics quantities sit underneath every recurrence-risk calculation. The carrier frequency in the relevant population sets the prior probability that a partner of an obligate carrier carries the same allele; the population frequency of a polygenic risk score sets the percentile against which a family-specific score is interpreted; the founder-allele structure of certain populations explains why Ashkenazi-ancestry individuals are documented as having distinct prior probabilities for BRCA1 and BRCA2 variants. None of those quantities are attributes of the family being drawn; they are properties of the population the family is sampled from. That distinction is what makes population-genetics literacy a routine skill for clinical geneticists and counsellors, not a niche interest.

Within Evagene, the documentation pages on hereditary cancer risk assessment, Mendelian inheritance calculation, and the complex-disease tooling all rely on population-genetics inputs — allele frequencies, penetrance estimates derived from population samples, and recurrence priors estimated against population baselines. The illustrative outputs in those pages are educational; the population-genetics layer beneath them is field-standard.

Where to read further

Foundational papers worth reading in the original:

Hardy, G. H. (1908). Mendelian proportions in a mixed population. Science 28:49–50.
Weinberg, W. (1908). Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64:368–382.
Fisher, R. A. (1930). The Genetical Theory of Natural Selection. Clarendon Press, Oxford.
Wright, S. (1931). Evolution in Mendelian populations. Genetics 16:97–159.
Kimura, M. (1968). Evolutionary rate at the molecular level. Nature 217:624–626.
Ohta, T. (1973). Slightly deleterious mutant substitutions in evolution. Nature 246:96–98.

Modern textbook references include Gillespie 2004 (Population Genetics: A Concise Guide), Hartl & Clark 2007 (Principles of Population Genetics), and Charlesworth & Charlesworth 2010 (Elements of Evolutionary Genetics). For human-genetics specialisation, Reich 2018 (Who We Are and How We Got Here) covers the ancient-DNA-driven reframing of the past decade, and the published flagship papers of the 1000 Genomes Project (Auton et al. 2015, Nature 526:68) and gnomAD (Karczewski et al. 2020, Nature 581:434) are the modern reference cohorts.

Frequently asked questions

What is the difference between population genetics and evolutionary genetics?

Population genetics is the mathematical theory of allele-frequency change within and between populations under mutation, selection, drift, recombination, and migration. Evolutionary genetics is the same subject extended over longer timescales and integrated with phylogenetics, comparative genomics, and the molecular record of adaptation. The two are continuous; the boundary is conventional rather than substantive.

Why does Hardy-Weinberg matter clinically?

Hardy-Weinberg gives the expected genotype frequencies (p² : 2pq : q²) from allele frequencies under random mating and no selection, and is the null model used in genome-wide quality control, in case-control association testing, and in deriving carrier frequencies from the homozygous-affected frequency of recessive disease alleles in published epidemiology.

What is the neutral theory of molecular evolution?

Kimura 1968 proposed that most molecular variation segregating in a population is neutral or nearly so, and that the rate of molecular evolution is set primarily by the rate of neutral mutation and the action of genetic drift, not by positive selection. The neutral and nearly neutral theory is the modern null model against which signals of selection are detected.

Why is population structure important for genetic-association studies?

When cases and controls in a study are drawn from genetically distinct subpopulations, allele-frequency differences between those subpopulations can produce spurious associations with any trait that also differs between the subpopulations. Principal component correction, mixed-model approaches, and within-family designs are routine controls for population stratification.

Does Evagene do population genetics?

Evagene is a pedigree-modelling platform for academic, research, and educational use. Population-genetics quantities — allele frequencies, carrier frequencies, penetrance estimates — sit underneath the implementations of published risk-model algorithms (Claus 1994, Couch 1997, Frank 2002, Tyrer / Duffy / Cuzick 2004, BayesMendel BRCAPRO / MMRpro / PancPRO, family-history scoring), but Evagene is not a population-genetics analysis platform. The educational pages collected here treat the field as background literacy for users of the platform.