Somatic genomics

A tumour is a population of cells under selection. The somatic genome captures the history of that population: the order of driver acquisition, the mutational processes operating in the cell of origin, the clonal architecture, and the molecular features that link the tumour to therapeutic vulnerabilities. This page is an educational reference on clonal evolution, mutational signatures, and the targeted-therapy biomarker landscape.

| 13 min read

Short version. Tumour evolution is Darwinian (Nowell 1976): mutations arise stochastically in a clonal lineage, and the variants that confer a selective advantage rise in frequency. Multi-region sequencing reveals subclonal architecture; ctDNA gives a longitudinal liquid-biopsy view; CHIP (Genovese et al. 2014) shows that clonal expansion of haematopoietic precursors is common and ageing-related. The mutational-signature framework (Alexandrov et al. 2013, 2020 PCAWG) decomposes the somatic mutation landscape into reproducible processes (UV, tobacco, APOBEC, HRD, MMR-deficiency, ageing). HRD scoring (Davies et al. 2017), MSI, and TMB are the major aggregate biomarkers; they map to PARP inhibitors (HRD/BRCA), immune-checkpoint inhibitors (MSI-H/dMMR; Le et al. 2015), and oncogene-targeted therapies (HER2, BCR-ABL, BRAF V600E, KRAS G12C).

Clonal evolution: tumours as Darwinian populations

Peter Nowell’s 1976 paper (Science 194:23) reframed the cancer cell as a member of an evolving population. A founder clone acquires a sequence of mutations; each mutation is acted on by selection in the local microenvironment; the population diversifies into subclones; immune pressure, hypoxia, and (eventually) treatment apply additional selection; the surviving clones expand. The Darwinian view supplies the explanation for several otherwise-puzzling clinical observations: intratumoural heterogeneity, primary resistance to a targeted agent at first contact, acquired resistance to that agent on treatment, and the differential metastatic competence of subclones within the same primary tumour.

The framework is now operationalised through multi-region sequencing of resected tumours: independent biopsies from spatially separated regions of a single tumour are sequenced, shared mutations placed on the trunk of an inferred phylogenetic tree, and region-specific mutations placed on subclonal branches. Studies including the TRACERx programme in non-small-cell lung cancer (Jamal-Hanjani et al. 2017, NEJM 376:2109) demonstrate that a substantial fraction of mutations — including some drivers — are subclonal, present in only a subset of regions, with implications for the kind of mutation a tissue biopsy can or cannot detect.

Circulating tumour DNA (ctDNA) and liquid biopsies

Tumour cells release fragmented DNA into the bloodstream, where it can be quantified and sequenced. Wan et al. 2017 (Nat Rev Cancer 17:223) reviewed the analytical and applied landscape. ctDNA assays span deep targeted sequencing of cancer-gene panels, methylation profiling for tissue-of-origin inference, and whole-genome low-coverage approaches for copy-number readout. Applications described in the published literature include early detection (multi-cancer early-detection panels), molecular residual disease monitoring after curative-intent surgery, longitudinal tracking of resistance-associated subclones during targeted therapy, and tissue-agnostic biomarker assessment when a fresh tissue biopsy is unavailable. Sensitivity is the principal limitation: low tumour fraction in plasma constrains detection of low-allele-frequency variants and limits the resolution of subclonal architecture compared with tissue-based multi-region sequencing.

Clonal haematopoiesis of indeterminate potential (CHIP)

Genovese et al. 2014 (NEJM 371:2477) and the contemporaneous Jaiswal et al. 2014 paper showed that healthy older adults frequently carry expanded haematopoietic clones bearing somatic mutations in genes recurrently mutated in haematological malignancy — DNMT3A, TET2, ASXL1, JAK2 in particular. Clones reach detectable variant allele frequencies (typically >2%) without progressing to overt leukaemia, hence “indeterminate potential”. CHIP is age-associated, with prevalence on the order of 10% in the seventh decade and higher above. CHIP carries a small but measurable progression risk to myeloid malignancy and is also associated with cardiovascular disease independent of conventional risk factors. For tumour-genomic analyses CHIP is a confounder — ctDNA assays sequencing only plasma can detect CHIP variants and mistakenly attribute them to a solid tumour; matched-buffy-coat or matched-germline sequencing is the published mitigation.

Mutational signatures

The somatic mutation catalogue of a tumour is not random. Different mutational processes — ultraviolet damage, tobacco-smoke metabolites, APOBEC cytidine deamination, defective mismatch repair, defective homologous recombination, age-related cytosine deamination — leave different fingerprints in the trinucleotide context distribution of single-base substitutions, in the spectrum of doublet-base substitutions, and in the indel size distribution. Alexandrov et al. 2013 (Nature 500:415) introduced the framework: non-negative matrix factorisation of the 96-channel SBS spectrum across thousands of tumours yields a small set of reproducible signatures, each interpretable in terms of an underlying mutational process. The Pan-Cancer Analysis of Whole Genomes companion (Alexandrov et al. 2020, Nature 578:94) extended the catalogue to whole-genome data and added doublet-base substitution (DBS) and small-insertion-deletion (ID) signature types. The COSMIC database (cancer.sanger.ac.uk/signatures) is the canonical, regularly-updated public catalogue.

The signatures whose biology is most reliably interpreted in published research and teaching:

  • SBS1 / SBS5 — clock-like signatures correlating with patient age at diagnosis; SBS1 reflects spontaneous deamination of 5-methylcytosine.
  • SBS2 and SBS13 — APOBEC cytidine deamination, common in breast, bladder, cervical, head-and-neck, and lung cancers; characterised by C>T and C>G substitutions in TpCpN trinucleotide contexts.
  • SBS3 — homologous-recombination deficiency, the BRCA1/2-associated signature; flat across the 96 channels with characteristic indels and large structural-variant features. The basis for HRD scoring (see below).
  • SBS4 — tobacco-smoke-related, with C>A transversions reflecting bulky-adduct misrepair; dominant in tobacco-driven lung cancer, with a transcribed-strand bias from transcription-coupled repair.
  • SBS6, SBS15, SBS20, SBS26 — mismatch-repair deficiency signatures, found in MSI-H tumours from Lynch-syndrome germline carriers (see inherited cancer predisposition) or somatic dMMR contexts.
  • SBS7a / SBS7b / SBS7c / SBS7d — ultraviolet light, dominant in cutaneous melanoma; characterised by C>T transitions at dipyrimidine sites and CC>TT doublet substitutions (the hallmark UV fingerprint).
  • SBS9 — somatic hypermutation by activation-induced cytidine deaminase (AID), found in B-cell lymphomas.
  • SBS22, SBS24 — environmental aristolochic acid and aflatoxin exposure respectively; clinically relevant in upper-tract urothelial cancers (Balkan endemic nephropathy regions) and hepatocellular carcinoma in regions with aflatoxin exposure.

HRD, MSI, and TMB as aggregate biomarkers

Three aggregate biomarkers derived from the somatic genome are routinely discussed in published research and treatment-decision frameworks. None is a property of a single variant; each is a property of the genome.

HRD (homologous-recombination deficiency)

Loss of homologous-recombination repair — through germline or somatic biallelic inactivation of BRCA1, BRCA2, PALB2, RAD51 paralogues, or other HR-pathway genes — produces a characteristic genome-wide footprint that includes mutational signature 3, large structural variants, and chromosomal-instability features (loss of heterozygosity, telomeric allelic imbalance, large-scale state transitions). Davies et al. 2017 (Nat Med 23:517) introduced HRDetect, a supervised classifier that scores HRD probability from whole-genome sequencing. Composite HRD scores combining LOH, telomeric-allelic-imbalance, and large-scale-state-transition counts (Myriad myChoice, Foundation HRD) underlie published reporting frameworks. HRD-positive tumours are described in the published literature as a population with selective sensitivity to PARP inhibitors and to platinum chemotherapy — a finding extensively studied in ovarian and breast cancer.

Microsatellite instability (MSI)

Loss of mismatch repair — through germline biallelic loss in Lynch syndrome or somatic biallelic loss including MLH1 promoter hypermethylation — causes runaway insertion / deletion errors at microsatellite repeats. MSI is detected by PCR (the Bethesda panel and successors) or by IHC for the MMR proteins MLH1, MSH2, MSH6, PMS2; sequencing-based MSI scores from large amplicon panels are a common alternative. MSI-H tumours show extreme TMB, an SBS6/15/20/26 signature profile, and a characteristic indel signature. MSI-H is the foundational biomarker for the published response of dMMR colorectal cancer (and cross-tumour-type MSI-H disease) to immune-checkpoint inhibition, established in Le et al. 2015 (NEJM 372:2509) and extended in subsequent FDA / EMA tissue-agnostic approvals of pembrolizumab.

Tumour mutational burden (TMB)

TMB is the count of non-synonymous somatic mutations per megabase of sequenced exome. High TMB associates with mutational processes that mass-produce mutations (UV, tobacco, MMR-deficiency, POLE/POLD1 exonuclease loss). The biological logic for the immunotherapy connection is that high TMB increases neoantigen load, increasing the probability of T-cell-mediated tumour recognition. Published thresholds (10 mutations/Mb is one widely-cited cutoff; assay-specific calibration applies) have been used in tissue-agnostic checkpoint-inhibitor approval frameworks. TMB is correlated with but distinct from MSI; high TMB without MSI occurs in UV-driven and tobacco-driven cancers.

Targeted-therapy biomarker landscape

The single-variant biomarkers that map a tumour-genomic finding to a targeted therapy are the operational endpoint of much of cancer genomics. The canonical examples covered in published research and standard oncology teaching:

  • HER2 (ERBB2) amplification → trastuzumab in breast and gastric cancer, and the antibody-drug conjugates trastuzumab emtansine and trastuzumab deruxtecan in HER2-low and HER2-positive disease.
  • BCR-ABL1 fusion → imatinib in chronic myeloid leukaemia (CML); subsequent generation TKIs (dasatinib, nilotinib, ponatinib) cover resistance mutations including the gatekeeper T315I.
  • BRAF V600E → vemurafenib / dabrafenib in cutaneous melanoma (and BRAF V600E colorectal cancer in combination with EGFR blockade); MEK inhibitor combinations (trametinib, cobimetinib, binimetinib) extend duration of response.
  • KRAS G12C → sotorasib / adagrasib in lung adenocarcinoma; the first selective inhibitor of an activated KRAS allele after decades of an “undruggable” reputation.
  • EGFR L858R / exon-19 deletion → osimertinib and predecessor EGFR-TKIs in NSCLC; T790M and C797S as published resistance mutations.
  • EML4-ALK / EML4-ROS1 fusions → crizotinib, alectinib, lorlatinib, brigatinib in NSCLC.
  • FLT3-ITD → midostaurin / gilteritinib in AML.
  • NTRK fusions → larotrectinib / entrectinib in tissue-agnostic settings.
  • BRCA1/2 / HRD → PARP inhibitors (olaparib, talazoparib, niraparib, rucaparib) in ovarian, breast, pancreatic, and prostate cancer; the published mechanism is synthetic lethality between PARP inhibition and HR deficiency.
  • MSI-H / dMMR → immune-checkpoint inhibitors (pembrolizumab, dostarlimab, nivolumab) tissue-agnostically (Le et al. 2015).
  • PD-L1 expression / TMB-high → immune-checkpoint inhibitors in numerous solid tumours; thresholds and assay calibration are tissue and assay specific in the published frameworks.

Cross-link: the Lynch / MMRpro / MSI pathway

The Lynch-syndrome / dMMR / MSI-H axis is the cleanest worked example of how the three pillars of cancer genetics intersect.

  1. Germline. A pathogenic variant in MLH1, MSH2, MSH6, or PMS2 supplies one inactivating allele in every cell, including every colonic / endometrial epithelial cell. Family-history modelling for the carrier probability uses BayesMendel MMRpro; family-history pattern recognition uses the Amsterdam II / revised Bethesda criteria as published frameworks.
  2. Somatic. Loss of the second allele — by LOH, point mutation, or epigenetic silencing — produces a dMMR cell. Microsatellite instability accumulates rapidly; the cell becomes hypermutated; SBS6/15/20/26 mutational signatures are evident; TMB is very high.
  3. Therapeutic. The same dMMR / MSI-H phenotype confers selective sensitivity to immune-checkpoint inhibition, established for colorectal cancer in Le et al. 2015 and extended tissue-agnostically.

The same template (germline first hit → somatic second hit → characteristic mutational signature → therapeutic vulnerability) applies, with adaptations, to BRCA1/2 / HR-deficiency / signature 3 / PARP inhibition. Two foundational pathways, two clean illustrations of how somatic genomics is interpreted in the context of inherited predisposition.

Where Evagene fits (and does not)

Evagene is an academic, research, and educational pedigree modelling platform. It documents family-history information, applies published risk-model algorithms (BRCAPRO, MMRpro, PancPRO, Tyrer-Cuzick IBIS-style approximation, Claus, Couch, Frank, Manchester, NICE family-history triage representation, Gail, Amsterdam II / revised Bethesda representations) to that pedigree, and exports CanRisk pedigree files for off-platform BOADICEA. Outputs are illustrative and for research / education purposes.

Evagene does not compute somatic-tumour mutational signatures, decompose a SBS / DBS / ID spectrum, calculate HRD scores from sequencing data, run MSI calling, compute TMB, or interpret tumour-genomic variants for therapeutic relevance. The somatic-genomics content here is educational reference. Tumour-genomic computation is performed by dedicated bioinformatics pipelines and reporting platforms operated by clinical genomic laboratories; this page links to the published methods and standards rather than implementing them.

Selected sources

  • Nowell PC. The clonal evolution of tumor cell populations. Science 1976; 194:23. PMID 959840.
  • Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature 2013; 500:415. PMID 23945592.
  • Alexandrov LB, Kim J, Haradhvala NJ, et al. The repertoire of mutational signatures in human cancer. Nature 2020; 578:94. PMID 32025018.
  • ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 2020; 578:82. PMID 32025007.
  • Genovese G, Kähler AK, Handsaker RE, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. NEJM 2014; 371:2477. PMID 25426837.
  • Davies H, Glodzik D, Morganella S, et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat Med 2017; 23:517. PMID 28288110.
  • Wan JCM, Massie C, Garcia-Corbacho J, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer 2017; 17:223. PMID 28233803.
  • Le DT, Uram JN, Wang H, et al. PD-1 blockade in tumors with mismatch-repair deficiency. NEJM 2015; 372:2509. PMID 26028255.
  • Jamal-Hanjani M, Wilson GA, McGranahan N, et al. Tracking the evolution of non-small-cell lung cancer (TRACERx). NEJM 2017; 376:2109. PMID 28445112.
  • COSMIC mutational signatures — cancer.sanger.ac.uk/signatures.
  • The Cancer Genome Atlas Program — cancer.gov/tcga.

Related Evagene pages

Try Evagene’s pedigree platform

In-browser pedigree drawing with NSGC notation, gesture drawing, GEDCOM and CanRisk export, 20 published risk-model algorithms, and a 1,900-entry help catalogue. Free during alpha for clinicians, researchers, educators, and students. For research, education, and family-history documentation; not a medical device.

Join the Alpha Waiting List