DNA replication and repair: replisome, polymerases, MMR, NER, BER, and double-strand break repair
Every human cell division copies ~6 billion base pairs at fork rates of around 1–2 kb per minute. The fidelity demanded of that copy — ~1 mistake per 109–1010 nucleotides after proofreading and repair — is the product of a layered system: licensed origins, replicative polymerases with proofreading exonucleases, post-replicative mismatch repair, and damage-specific repair pathways for lesions introduced by metabolism, radiation, and chemicals. This page covers the replisome and the four major repair pathways, and connects each to its associated genetic conditions.
Short version. Replication is licensed in G1 by ORC and the MCM2-7 helicase, fired in S phase, and executed by the replisome — CMG helicase, Pol α-primase, Pol ε on the leading strand, Pol δ on the lagging strand, PCNA, FEN1, and DNA ligase 1. Repair is layered: mismatch repair corrects post-replicative errors (Lynch syndrome biology); nucleotide excision repair removes bulky helix-distorting lesions (xeroderma pigmentosum); base excision repair handles small base lesions; double-strand breaks are repaired by homologous recombination (BRCA1/BRCA2/RAD51) or non-homologous end joining (KU70/80, DNA-PKcs, LIG4).
Origin licensing and firing
Replication initiates at thousands of origins distributed across each chromosome. In late mitosis and through G1, the six-subunit origin recognition complex (ORC1–ORC6) binds DNA at potential origins, recruits CDC6 and CDT1, and loads the inactive MCM2-7 helicase as a head-to-head double hexamer encircling double-stranded DNA. This is licensing — the loading is permitted only when CDK activity is low. At the G1/S transition, the Dbf4-dependent kinase (DDK) and S-phase cyclin-CDK activities phosphorylate MCM subunits and recruit CDC45 and the four-subunit GINS complex to convert each MCM2-7 hexamer into a CMG (CDC45-MCM-GINS) helicase. CMG opens the duplex and translocates 3'→5' on the leading-strand template. Once-per-cell-cycle replication is enforced by the strict temporal separation of licensing (low CDK) and firing (high CDK). Bell and Dutta 2002 remains a canonical review of eukaryotic origin biology.
Origin choice in metazoans is flexible. Most mammalian origins are not strictly sequence-defined — they are populations of potential origins, only a fraction of which fire in any given cell cycle. Replication timing across the genome is highly reproducible: A-compartment chromatin replicates early, B-compartment late, with replication-timing domains coinciding partially with TADs.
The replisome
At a fork, the replisome carries: the CMG helicase unwinding the duplex; Pol α-primase synthesising short RNA-DNA hybrid primers (~25–30 nucleotides) on each Okazaki fragment; Pol ε (epsilon) extending the leading strand continuously; Pol δ (delta) extending each lagging-strand Okazaki fragment to the next; PCNA loaded by RFC as the sliding clamp that tethers Pol δ/ε to DNA; FEN1 cleaving the 5' flap as Pol δ reaches the previous fragment; and DNA ligase 1 sealing the resulting nick. Topoisomerase I and II act ahead of and behind the fork to relieve positive supercoiling and decatenate sister chromatids.
The four families of human replicative and repair polymerases:
- Pol α (POLA1 catalytic, plus PRIM1/PRIM2 primase, POLA2 accessory): primer synthesis. Low fidelity; lacks a proofreading exonuclease. Each Okazaki fragment begins with a Pol α primer of ~25–30 nucleotides which is later excised by FEN1.
- Pol δ (POLD1 catalytic, POLD2/3/4 accessory): lagging-strand replication and intermediate repair. High fidelity; carries a 3'→5' proofreading exonuclease. Germline POLD1 exonuclease-domain pathogenic variants are associated with polymerase-proofreading-associated polyposis.
- Pol ε (POLE catalytic, POLE2/3/4 accessory): leading-strand replication. High fidelity; 3'→5' proofreading exonuclease. Germline POLE exonuclease-domain pathogenic variants similarly produce polymerase-proofreading polyposis.
- Pol β (POLB): short-patch base excision repair. No proofreading. Gap-filling polymerase.
- Pol η, κ, ι, ζ, REV1, ν, θ, μ, λ: translesion synthesis and specialised repair. Pol η (POLH) bypasses UV-induced cyclobutane pyrimidine dimers; loss of Pol η causes the variant form of xeroderma pigmentosum (XP-V).
- Pol γ (POLG): mitochondrial DNA replication. POLG variants are associated with several mitochondrial disorders including Alpers syndrome and progressive external ophthalmoplegia.
Polymerase fidelity is reviewed in Kunkel and Bebenek 2000. Replicative polymerases achieve a base-substitution error rate of ~10−5 per base; with proofreading, ~10−7; with mismatch repair downstream, ~10−9 to 10−10.
Lagging-strand mechanics: Okazaki fragments
Because all DNA polymerases extend 5'→3', the lagging strand cannot be copied continuously. Each segment of the lagging strand is initiated by a Pol α primer and extended by Pol δ until it reaches the 5' end of the previous fragment, where Pol δ performs limited strand displacement to create a 5' flap. FEN1 cleaves the flap and DNA ligase 1 seals the resulting nick. The Okazaki fragment is the unit of lagging-strand synthesis; in mammalian cells, fragment length matches roughly one nucleosome (~165 bp), reflecting the integration of replisome and chromatin assembly.
Fork stalling, collapse, and restart
Replication forks routinely encounter obstacles: damaged template bases, transcription complexes, R-loops, repetitive sequences, and protein-bound DNA. A stalled fork is initially stabilised by ATR-CHK1 signalling, which suppresses late origin firing, stabilises replication-protein A (RPA) on exposed single-stranded DNA, and allows time for repair. Persistent stalling can lead to fork reversal (regression of the fork into a four-way "chicken-foot" intermediate, a controlled state that can be restarted) or fork collapse (loss of the replisome, often accompanied by a one-ended double-strand break). Restart pathways depend on context: BRCA1/BRCA2/RAD51-mediated homologous recombination is the principal restart route; the FANC pathway processes interstrand crosslinks; Pol η and other translesion polymerases bypass small-footprint lesions in a tolerance pathway that may leave a damaged base for later repair.
Mismatch repair and the Lynch-syndrome pathway
Mismatch repair (MMR) corrects post-replicative errors that escape polymerase proofreading: single base-base mismatches and short insertion-deletion loops at slipped repeats. The pathway is conserved from E. coli MutS/MutL/MutH to the eukaryotic MutSα / MutSβ / MutLα system. In humans:
- MutSα (MSH2-MSH6): recognises base-base mismatches and small insertion-deletion loops (1–2 nucleotides).
- MutSβ (MSH2-MSH3): recognises larger insertion-deletion loops.
- MutLα (MLH1-PMS2): recruited by MutS, has endonuclease activity that nicks the daughter strand to license excision and resynthesis.
- EXO1, RPA, Pol δ, DNA ligase 1: process the nick into an excised tract, resynthesise across the gap, and seal.
Strand discrimination in eukaryotic MMR uses pre-existing nicks on the nascent strand (the most recent replication-fork nick on the lagging strand and a similar signal on the leading strand) rather than the methylation-based GATC discrimination used by E. coli MutH. Mechanism is reviewed by Modrich 2006; biological context by Hsieh and Yamane 2008.
Heterozygous germline pathogenic variants in MLH1, MSH2, MSH6, or PMS2 (or in EPCAM, which silences MSH2 in cis) underlie Lynch syndrome. Tumours from MMR-deficient cells exhibit microsatellite instability (slipped-strand insertions and deletions at simple-sequence repeats) and a high tumour mutation burden. Lynch syndrome is associated with elevated lifetime risk of colorectal, endometrial, ovarian, gastric, urinary-tract, pancreatic, brain, hepatobiliary, and small-bowel cancers. The educational Lynch syndrome risk calculator page on this site illustrates how published family-history risk models (MMRpro from the BayesMendel suite; PREMM5; and family-history scoring) can be implemented for research and teaching purposes; outputs are illustrative and not a clinical recommendation.
Nucleotide excision repair and xeroderma pigmentosum
Nucleotide excision repair (NER) removes bulky, helix-distorting lesions that span the duplex: UV-induced cyclobutane pyrimidine dimers and 6-4 photoproducts, bulky chemical adducts (benzo[a]pyrene-DNA, cisplatin), and several oxidatively generated lesions. NER has two sub-pathways that converge on the same downstream machinery:
- Global genome NER: lesion detection by XPC-RAD23B (with UV-DDB / DDB1-DDB2 enhancing detection of UV photoproducts in chromatin).
- Transcription-coupled NER: lesion detection through stalled RNA polymerase II, with CSA and CSB recruiting the downstream machinery; defects produce Cockayne syndrome.
Both routes recruit TFIIH (containing the XPB and XPD helicases), which opens a ~25–30 nt bubble around the lesion. XPA verifies the lesion. RPA coats the undamaged strand. Two structure-specific endonucleases — XPF-ERCC1 (5' incision) and XPG (3' incision) — excise an oligonucleotide containing the lesion. Pol δ or Pol ε with PCNA fills the gap, and DNA ligase 1 or LIG3 seals it.
Germline biallelic loss-of-function variants in XPA, XPB (ERCC3), XPC, XPD (ERCC2), XPE (DDB2), XPF (ERCC4), XPG (ERCC5), or in POLH (XP-V) cause xeroderma pigmentosum. NER is reviewed by Lindahl and Wood 1999.
Base excision repair
Base excision repair (BER) handles small base lesions that do not greatly distort the helix: oxidised bases (8-oxoguanine), deaminated bases (uracil from cytosine deamination), alkylated bases, and abasic sites. The pathway begins with a damage-specific DNA glycosylase (OGG1 for 8-oxoguanine, UNG for uracil, MUTYH for adenine paired with 8-oxoguanine, MPG for 3-methyladenine, several others), which cleaves the N-glycosidic bond and releases the damaged base, leaving an abasic site. AP endonuclease 1 (APE1) cleaves the backbone 5' to the abasic site. Short-patch BER fills a single nucleotide using Pol β and seals with the XRCC1-LIG3 complex; long-patch BER displaces 2–10 nucleotides using Pol δ and PCNA, with FEN1 cleaving the displaced flap and DNA ligase 1 sealing.
Germline biallelic pathogenic variants in MUTYH cause MUTYH-associated polyposis (a recessive colorectal cancer predisposition). MUTYH removes adenine mis-incorporated opposite 8-oxoguanine, an oxidative lesion abundant in high-replication tissues. Jackson and Bartek 2009 review the integrated DNA damage response.
Double-strand break repair: HR vs NHEJ
Double-strand breaks (DSBs) are produced by ionising radiation, replication fork collapse, programmed events at meiosis (SPO11), V(D)J recombination at immunoglobulin loci, and topoisomerase failure. Two principal pathways repair DSBs in mammalian cells.
Homologous recombination (HR). Active in S and G2 phases when a sister chromatid is available as a template. The DNA ends are recognised by the MRE11-RAD50-NBN (MRN) complex, which recruits ATM kinase. CtIP, BRCA1, and EXO1/DNA2 perform 5'→3' end resection to generate long 3' single-stranded overhangs, initially coated by RPA. BRCA2 (via PALB2 and BRCA1) loads the RAD51 recombinase onto the resected ends, displacing RPA. The RAD51 nucleoprotein filament searches for and invades a homologous duplex, capturing the sister chromatid as a template for repair. After D-loop formation and synthesis, the resulting joint molecule is resolved by Holliday-junction processing or synthesis-dependent strand annealing. HR is essentially error-free.
Heterozygous germline pathogenic variants in BRCA1, BRCA2, PALB2, RAD51C, RAD51D, and several other HR components are associated with elevated lifetime risk of breast, ovarian, pancreatic, and prostate cancers. The educational pedigree-modelling pages for BRCAPRO, hereditary cancer risk assessment, and breast cancer family history illustrate how published Mendelian-risk and family-history models can be implemented; outputs are illustrative and for research and teaching only. BOADICEA is licensed by the University of Cambridge and is not bundled in Evagene; the platform exports a `##CanRisk 2.0` pedigree file for upload at canrisk.org when BOADICEA computation is wanted. The Tyrer-Cuzick implementation is an IBIS-style approximation of the published Tyrer/Duffy/Cuzick 2004 algorithm, not the official IBIS Breast Cancer Risk Evaluator binary.
Non-homologous end joining (NHEJ). Active throughout the cell cycle; the dominant DSB-repair pathway in G1 and in non-cycling cells. The KU70-KU80 (XRCC6-XRCC5) heterodimer binds the broken ends and recruits DNA-PKcs (PRKDC) to form the DNA-PK holoenzyme. Artemis (DCLRE1C) processes hairpin-capped or otherwise blocked ends. Pol μ and Pol λ perform any minimal gap-filling. The XRCC4-LIG4-XLF complex ligates the ends. NHEJ can introduce small insertions or deletions at the junction and is the basis for V(D)J recombination at immunoglobulin and T-cell-receptor loci. Germline pathogenic variants in LIG4, NHEJ1 (XLF), DCLRE1C, and PRKDC cause radiosensitive severe combined immunodeficiency phenotypes and the related LIG4 syndrome.
Pathway choice between HR and NHEJ is regulated by cell-cycle phase (CDK activity drives end resection in S/G2), chromatin context, and the antagonistic activities of 53BP1-RIF1-Shieldin (favouring NHEJ) and BRCA1-CtIP (favouring resection and HR).
Why this matters for pedigree-modelling teaching
The repair pathway underlying a cancer-predisposition syndrome dictates the molecular signature of the tumour, the kinds of variants likely to be observed in the family, and the structure of the published risk models that are taught alongside pedigree analysis. Mismatch repair deficiency produces microsatellite instability and underlies the family-history pattern characteristic of Lynch syndrome. Homologous recombination deficiency underlies the breast-and-ovarian-cancer family-history pattern characteristic of BRCAPRO and BOADICEA. Evagene's pedigree-modelling tools are educational and research tools; outputs from any of the 20 implemented risk models are illustrative and not a clinical recommendation. See also the companion pages on DNA and chromatin organisation and genome structure and variation.
Key references
- Bell SP, Dutta A. DNA replication in eukaryotic cells. Annu Rev Biochem 71:333–374 (2002). PMID 12045100.
- Kunkel TA, Bebenek K. DNA replication fidelity. Annu Rev Biochem 69:497–529 (2000). PMID 10966476.
- Modrich P. Mechanisms in eukaryotic mismatch repair. J Biol Chem 281:30305–30309 (2006). PMID 16905530.
- Hsieh P, Yamane K. DNA mismatch repair: molecular mechanism, cancer, and ageing. Mech Ageing Dev 129:391–407 (2008). PMID 18406447.
- Lindahl T, Wood RD. Quality control by DNA repair. Science 286:1897–1905 (1999). PMID 10583944.
- Jackson SP, Bartek J. The DNA-damage response in human biology and disease. Nature 461:1071–1078 (2009). PMID 19847258.
- OMIM: Lynch syndrome (120435); Xeroderma pigmentosum complementation group A (278700); BRCA1 (113705); BRCA2 (600185).
Frequently asked questions
What licenses replication origins?
ORC, CDC6, and CDT1 load the inactive MCM2-7 helicase as a double hexamer in G1; DDK and CDK activate firing in S phase by recruiting CDC45 and GINS to form CMG.
Which polymerase replicates the leading strand?
DNA polymerase ε (epsilon) is the principal leading-strand polymerase in humans. The lagging strand is synthesised mainly by Pol δ, with each Okazaki fragment primed by Pol α-primase.
What is mismatch repair and how does it relate to Lynch syndrome?
MMR corrects post-replicative base-base mismatches and short insertion-deletion loops; the human pathway uses MutSα (MSH2-MSH6), MutSβ (MSH2-MSH3), and MutLα (MLH1-PMS2). Heterozygous germline pathogenic variants in these genes (or EPCAM) cause Lynch syndrome.
What is the difference between HR and NHEJ?
HR uses a sister chromatid as a template, requires end resection, is restricted to S/G2, and is BRCA1/BRCA2/RAD51-dependent; it is essentially error-free. NHEJ ligates broken ends directly using KU70/80, DNA-PKcs, Artemis, XRCC4, XLF, and LIG4; it operates throughout the cell cycle and can introduce small insertions or deletions.
What is xeroderma pigmentosum?
A rare recessive disorder caused by germline biallelic loss-of-function variants in any of seven NER genes (XPA-XPG) or in POLH (XP-V), characterised by extreme photosensitivity and a high cumulative incidence of UV-induced skin cancers.