Types of mutation
A taxonomy of mutation, from single-base substitutions to whole-chromosome aneuploidy. The page covers point mutations and their coding consequences; indels and frameshifts; trinucleotide repeat expansions and dynamic mutation; structural variants and copy-number variants; somatic and germline mosaicism; de novo mutation and the paternal age effect; and mutational signatures.
Short version. Mutations are conventionally classified by scale (single base, indel, repeat expansion, structural variant, chromosomal) and by origin (germline vs somatic, inherited vs de novo). Point substitutions are common; indels are roughly an order of magnitude rarer per base; structural variants are rare per genome but contribute the most sequence change. The de novo single-nucleotide variant rate in humans is approximately 1.2×10−8 per site per generation (Kong et al. 2012); paternal age increases the rate substantially. Mutational signatures — the trinucleotide context patterns of different mutagenic processes — were characterised by Alexandrov et al. 2013.
Point mutations: substitutions
A substitution replaces one nucleotide with another at a single position. Substitutions divide into transitions (purine→purine: A↔G; or pyrimidine→pyrimidine: C↔T) and transversions (purine→pyrimidine and the reverse, e.g. A↔C, A↔T, G↔C, G↔T). Transitions outnumber transversions in most genomes — for the human germline, the ratio is roughly two transitions to one transversion — despite the fact that there are twice as many possible transversion changes (eight) as transition changes (four). The asymmetry reflects mechanism: spontaneous deamination of 5-methylcytosine to thymine produces C→T transitions at CpG dinucleotides at roughly tenfold the background rate, and CpG sites contribute disproportionately to the human germline mutation load.
The functional consequence of a coding substitution depends on its position in the codon and the genetic code. Synonymous substitutions change a codon to one specifying the same amino acid (often at the third codon position) and were historically treated as silent; sequence-context effects on splicing, mRNA stability, and codon-usage-mediated translation rate mean a fraction are not. Missense substitutions change the encoded amino acid; their consequence depends on the chemistry of the substitution and on which residue it falls on. Nonsense substitutions introduce a premature stop codon (TAA, TAG, TGA) and typically trigger nonsense-mediated decay of the transcript. Splice-site substitutions, particularly at the canonical GT donor and AG acceptor positions of every intron, disrupt mRNA splicing and produce exon-skipping or intron-retention transcripts; we treat splice and regulatory variation in detail on the functional consequences of mutation page.
Insertions, deletions, and frameshifts
Insertions and deletions (indels) of one or more nucleotides are the next class. A small in-frame indel (a multiple of three) changes the protein by adding or removing amino acids without disrupting the reading frame. A frameshift indel (any indel whose length is not a multiple of three) shifts the codon register from the indel onwards, replacing the C-terminal portion of the protein with a string of unrelated residues until the shifted frame encounters a stop codon — typically within a few dozen residues. Frameshift transcripts are most often degraded by nonsense-mediated decay; where they escape it, the resulting truncated protein is usually non-functional or, in some cases, dominant-negative.
Indels arise mostly through replication slippage at short tandem repeats (microsatellite slippage, e.g. at mononucleotide A-tracts and CA dinucleotide repeats), through erroneous repair of DNA damage, or through non-homologous end joining of double-strand breaks. Microsatellite slippage in mismatch-repair-deficient cells produces the high indel burden characteristic of the microsatellite-instability-high (MSI-H) class of tumours.
Trinucleotide repeat expansions and dynamic mutation
A small number of human disorders are caused by a class of mutation in which a tandem repeat tract expands across generations and the disease appears or worsens once the tract exceeds a length threshold. The repeats are usually trinucleotides (CAG, CGG, CTG, GAA), and the disorders are heterogeneous in mechanism — the expansion can occupy a coding region (Huntington disease), a 5′ UTR (fragile X syndrome), an intron (Friedreich ataxia, C9orf72), or a 3′ UTR (myotonic dystrophy type 1) — but they share the dynamic-mutation phenomenology of progressive, generation-to-generation expansion and the clinical phenomenon of anticipation: earlier onset and increasing severity in successive generations.
Canonical examples:
- Huntington disease — HTT exon 1 CAG, encoding a polyglutamine tract; healthy < 27, intermediate 27–35, reduced-penetrance 36–39, fully-penetrant ≥ 40 repeats.
- Fragile X syndrome — FMR1 5′ UTR CGG; normal < 45, intermediate 45–54, premutation 55–200, full mutation > 200 repeats with promoter hypermethylation and FMR1 silencing.
- Friedreich ataxia — FXN intron 1 GAA; recessive, normal < 33, full mutation hundreds to over a thousand repeats with FXN transcriptional silencing.
- Myotonic dystrophy type 1 — DMPK 3′ UTR CTG; type 2 — CNBP intron 1 CCTG tetranucleotide. Both cause RNA-level toxicity through nuclear retention of the expanded transcript and sequestration of MBNL splicing factors.
- Spinocerebellar ataxias (SCA1, 2, 3, 6, 7, 17, dentatorubral-pallidoluysian atrophy) — coding CAG polyglutamine expansions in different genes.
The mechanism of expansion involves stable secondary structures formed by the repeat tract (hairpins, slipped-strand DNA, R-loops) that mislead replication, repair, and recombination machineries. Mirkin's 2007 review (Nature 447:932) lays out the structural biology of expandable repeats; La Spada and Taylor 2010 (Nature Reviews Genetics 11:247) reviews the disease mechanisms across the dynamic-mutation disorders. Long-read sequencing now resolves these tracts at base-pair resolution, where short-read sequencing collapses them.
Structural variants and copy-number variation
Structural variants (SVs) are changes affecting more than ~50 base pairs: large deletions, large insertions, duplications, inversions, translocations, and complex rearrangements. Copy-number variants (CNVs) are the deletion/duplication subset, which can affect entire genes or multi-gene regions. Although individual SVs are rarer than SNVs per genome, they typically affect many more nucleotides each, and SVs and CNVs collectively account for the majority of nucleotide-level differences between any two human genomes.
Mechanisms of SV formation include non-allelic homologous recombination between segmental duplications (a common cause of recurrent CNVs at known hotspots), non-homologous end joining and microhomology-mediated end joining at double-strand breaks, replication-fork-collapse-mediated mechanisms (FoSTeS, MMBIR), and Alu/L1 retrotransposition. SVs underpin many genomic disorders, from the recurrent 22q11.2 deletion of DiGeorge / velocardiofacial syndrome to the 7q11.23 deletion of Williams-Beuren syndrome and the recurrent NF1 microdeletions in neurofibromatosis type 1.
Aneuploidy — gain or loss of an entire chromosome — is the largest scale of mutation. Trisomy 21, 18, and 13 are the autosomal aneuploidies viable to live birth at appreciable frequency; sex-chromosome aneuploidies (47,XXY; 47,XYY; 45,X) follow their own clinical patterns. Most aneuploidies arise as meiotic non-disjunction events and rise sharply with maternal age.
Mosaicism: somatic and germline
A mutation present in a fertilised egg ends up in every cell of the resulting individual; this is the standard germline-transmitted variant. A mutation that arises after fertilisation is mosaic: present in some cells, absent in others, with the proportion (variant allele fraction, VAF) depending on when in development the mutation occurred and which lineage it occurred in.
Somatic mosaicism is universal — every cell division is an opportunity for replication error — and is the substrate of cancer, of segmental skin disorders (the McCune-Albright pattern of GNAS mutation), of cortical dysplasia syndromes (MTOR, PIK3CA, AKT1 activating mutations confined to brain), and of clonal haematopoiesis of indeterminate potential (CHIP). Germline mosaicism — mutation confined to the gonadal lineage of an apparently unaffected parent — explains apparent recurrence of an autosomal dominant or X-linked condition within a sibship despite negative parental testing on a blood sample. The recurrence-risk implications, and the published literature on quantifying it, are covered on our germline mosaicism calculator page. Conditions where germline mosaicism is empirically frequent include Dravet syndrome (SCN1A), Rett syndrome (MECP2), Duchenne muscular dystrophy (DMD), and osteogenesis imperfecta (COL1A1, COL1A2).
De novo mutation and the paternal age effect
De novo mutations — variants present in a child but absent from the blood-sequenced genomes of both parents — carry disproportionate weight in human genetics: they explain a substantial fraction of severe early-onset disorders that selection prevents from recurring across generations. The Icelandic trio study by Kong et al. 2012 (Nature 488:471) placed the de novo SNV rate at approximately 1.2×10−8 per site per generation — about 60–80 new variants per newborn — and demonstrated a strong paternal age effect: the rate increased by roughly two SNVs per year of paternal age, while the maternal contribution was approximately constant. The mechanism is the continuing mitotic activity of spermatogonial stem cells across the male reproductive lifespan; oocytes, by contrast, complete most of their divisions before birth.
The paternal age effect is most pronounced for a small number of disorders caused by gain-of-function point mutations in genes where the mutated allele confers a selective advantage on the spermatogonial stem-cell lineage that produced it. James Klein's 1972 paper on the paternal-effect class (Journal of Heredity 63:80), and the modern molecular synthesis from the Wilkie laboratory in Oxford, identified FGFR3 (achondroplasia, thanatophoric dysplasia), FGFR2 (Apert syndrome, Crouzon syndrome), and HRAS (Costello syndrome) as the prototypical examples; in these conditions, the paternal-age dependence is exponential rather than linear and the recurrent gain-of-function variants are dramatically over-represented relative to expectation under a neutral model.
Mutational signatures
Different mutagenic processes leave different fingerprints on the genome — characteristic patterns of substitution type and trinucleotide context. Tobacco-smoke carcinogens produce a G>T transversion signature; ultraviolet light produces a CC>TT tandem signature at dipyrimidines; APOBEC cytidine-deaminase activity produces a TCW context C>T and C>G signature; mismatch-repair deficiency produces a high indel rate; defective homologous-recombination (BRCA1, BRCA2 deficiency) produces a characteristic large-deletion and large-rearrangement signature.
The systematic decomposition of cancer genomes into a basis set of signatures was published by Alexandrov et al. 2013 (Nature 500:415) using non-negative matrix factorisation on thousands of tumour exomes; their signatures — updated and curated as the COSMIC mutational signatures (SBS, DBS, ID, and CN classes) at cancer.sanger.ac.uk/signatures — are now a standard tool for inferring the mutagenic history of a tumour. The same approach has been applied to germline mutation, where the residual signature reflects spontaneous deamination, oxidative damage, and replication-error processes.
Where Evagene fits
Evagene draws pedigrees and runs implementations of published family-history-based risk-model algorithms for teaching, research, and exploratory use. The platform does not perform sequencing, variant calling, or variant interpretation; it consumes structured family-history information that may include molecular results captured by a clinician or laboratory and presents that information in standard pedigree notation. Where this page touches the platform, it is via the recurrence-risk implications of mosaicism (handled by our germline mosaicism calculator) and via the inheritance patterns produced by recurrent de novo and dynamic-mutation disorders documented across our disease pages.
Sources cited on this page
- Kong A, et al. Rate of de novo mutations and the importance of father's age. Nature 2012;488:471 — PMID 22914163.
- Alexandrov LB, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415 — PMID 23945592.
- Mirkin SM. Expandable DNA repeats and human disease. Nature 2007;447:932 — PMID 17581585.
- La Spada AR, Taylor JP. Repeat expansion disease: progress and puzzles in disease pathogenesis. Nature Reviews Genetics 2010;11:247 — PMID 20177426.
- Klein J. The paternal age effect in achondroplasia. Journal of Heredity 1972;63:80.
- COSMIC mutational signatures — cancer.sanger.ac.uk/signatures (Wellcome Sanger Institute).
- OMIM entries: Huntington disease (143100); fragile X (300624); Friedreich ataxia (229300); myotonic dystrophy 1 (160900); achondroplasia (100800).