DNA and chromatin organisation: from base pairs to topologically associating domains
From the chemistry of nucleotides through to the megabase-scale organisation of interphase chromosomes, the genome is folded across nine orders of magnitude in length. This page covers the canonical layers: base-pairing energetics, supercoiling and the topoisomerases that manage it, the histone octamer and the nucleosome core particle, the unresolved status of the 30 nm fibre, Hi-C and what it revealed, A/B compartments, and topologically associating domains.
Short version. DNA is an antiparallel double helix held together by hydrogen bonds and stabilised by base stacking. The duplex is supercoiled, and topoisomerases manage that supercoiling. ~147 bp wraps around a histone octamer to form the nucleosome core particle. Whether the 30 nm fibre exists in nuclei is contested. Above the nucleosome level, the genome partitions into chromatin loops, topologically associating domains (TADs), A/B compartments, and chromosome territories — a layered architecture mapped genome-wide by Hi-C since 2009.
Nucleotide chemistry
Each deoxyribonucleotide is a 2'-deoxyribose sugar bearing a nitrogenous base on its 1' carbon and a phosphate on its 5' carbon. The four canonical bases are adenine and guanine (purines, two-ring) and cytosine and thymine (pyrimidines, one-ring). Adjacent nucleotides are linked by a phosphodiester bond between the 3'-OH of one sugar and the 5'-phosphate of the next, giving the chain its 5'→3' polarity. The polymer is hydrophilic on the outside (sugar-phosphate backbone, deprotonated at physiological pH and therefore polyanionic) and hydrophobic on the inside (stacked bases).
Two strands form an antiparallel duplex through base pairing. Adenine pairs with thymine through two hydrogen bonds; guanine pairs with cytosine through three. The 5'→3' direction of one strand runs opposite the 5'→3' direction of the other — the antiparallel orientation that allows complementary base contacts to form. The duplex is right-handed in B-form, with ~10.5 bp per turn under physiological salt and a mean rise of 3.4 Å per base pair. Major and minor grooves of distinct widths and chemistries expose base edges to protein readers; many DNA-binding proteins read sequence through major-groove contacts. The double helix was proposed by Watson and Crick 1953.
Watson–Crick versus Hoogsteen pairing
The canonical pairing geometry uses the Watson–Crick edge of each base. In Hoogsteen pairing, named for Karst Hoogsteen's 1959 X-ray crystallography of A·T co-crystals, the purine flips into the syn conformation around the glycosidic bond and presents its major-groove edge to the pyrimidine; A·T Hoogsteen has two hydrogen bonds, while protonated G·C+ Hoogsteen has three. In duplex DNA, NMR relaxation-dispersion experiments have shown that Watson–Crick and Hoogsteen pairs interconvert at ~1% population in normal contexts, and that the Hoogsteen population rises sharply at distorted lesions, abasic sites, and certain protein-bound complexes. Hoogsteen pairs are the basis for triplex DNA and several G-quadruplex topologies, and they participate in the recognition of damaged bases by certain DNA repair machineries. Energetic and structural reviews of these alternative geometries are widely available; the textbook position is that the duplex spends most of its time in Watson–Crick conformation but has measurable Hoogsteen excursions.
Base-pairing thermodynamics
Duplex stability is dominated by base stacking, not hydrogen bonding. Stacking arises from van der Waals dispersion forces and the hydrophobic effect of removing aromatic surface from water; it is sequence-dependent through the geometry and electrostatics of adjacent base steps. Nearest-neighbour models (the SantaLucia 1998 unified parameters are the standard) compute ΔG°, ΔH°, and ΔS° for a duplex from its sequence by summing the contributions of the ten distinct nearest-neighbour dinucleotide steps. The melting temperature Tm — the temperature at which 50% of duplexes are denatured — depends on length, GC content, salt, and the specific nearest-neighbour composition. GC pairs add stability not only through their three hydrogen bonds but also through favourable stacking, and CG/GC steps are among the most thermodynamically stable. The same model underlies primer-design, hybridisation-probe, and oligonucleotide-aptamer engineering.
DNA supercoiling and topoisomerases
For a closed circular duplex (or any duplex with constrained ends), the linking number Lk is the integer number of times one strand passes through the other. Lk is a topological invariant: it cannot change without strand breakage. The topological identity Lk = Tw + Wr partitions Lk into the twist Tw (helical turns of the duplex around its axis) and the writhe Wr (coiling of the duplex axis through space). Cellular DNA is maintained slightly negatively supercoiled, with a superhelical density σ of ~−0.05 (about 5% under-wound relative to a relaxed duplex of the same length). Negative supercoiling lowers the free-energy cost of strand separation during transcription and replication and biases certain non-B-form geometries (cruciforms, Z-DNA, R-loops, G-quadruplexes).
Topoisomerases are the enzymes that change Lk. Type I topoisomerases (Top1 in eukaryotes) introduce a transient single-strand break, allow controlled rotation, and re-ligate, changing Lk in steps of one. Type II topoisomerases (Top2α and Top2β in eukaryotes) introduce a transient double-strand break, pass a second duplex through the gap, and re-ligate, changing Lk in steps of two. Top2 activity is essential for sister-chromatid decatenation at mitosis and for relieving the positive supercoiling that accumulates ahead of replication forks. Wang 1996 remains a canonical review of topoisomerase mechanism. Topoisomerase II is a major target of clinically used anti-cancer drugs (etoposide, doxorubicin), which act by trapping the cleaved-DNA-enzyme intermediate.
The histone octamer and the nucleosome core particle
Eukaryotic DNA is packaged with a near-stoichiometric quantity of histone protein. The four core histones (H2A, H2B, H3, H4) are small, basic proteins of ~100–130 amino acids, each built around a histone-fold domain (three α-helices linked by two short loops). The four assemble into an (H3-H4)2 tetramer flanked by two H2A-H2B dimers, giving an octameric core. ~147 bp of DNA wraps around the octamer in 1.65 left-handed superhelical turns, contacting the histone-fold faces at fourteen widely separated minor-groove inward-facing positions. The 2.8 Å X-ray crystal structure of the nucleosome core particle (Luger et al. 1997) is the canonical reference; subsequent cryo-EM and crystallographic work has refined the picture to sub-3 Å resolution under varying conditions and modifications.
Linker DNA (~20–80 bp depending on cell type and species) connects adjacent nucleosomes; the linker histone H1 binds at the dyad and stabilises ~166 bp of DNA in the chromatosome (core particle plus H1). The flexible amino-terminal histone tails project out from the core and are the substrate for the great majority of post-translational modifications (acetylation, methylation, phosphorylation, ubiquitylation, sumoylation, ADP-ribosylation), which are read and written by the conserved chromatin-modification machinery and which underpin much of epigenetic inheritance.
The 30 nm fibre debate
For decades, textbooks pictured a regular 30 nm chromatin fibre intermediate between the 10 nm "beads on a string" and higher-order folds. Two competing geometries dominated: the one-start solenoid (six nucleosomes per turn, helical) and the two-start zigzag (alternating nucleosomes contact each other, with a more open helical lattice). Both geometries are observable in vitro at appropriate ionic strengths and nucleosome repeat lengths. In situ, however, the 30 nm fibre has been hard to visualise: cryo-electron tomography of mitotic chromosomes and nuclei shows a more disordered polymer of 5–24 nm in diameter, and ChromEMT (Ou et al. 2017) imaged interphase chromatin as a disordered chain of nucleosomes packed at varying densities. The contemporary consensus is that the 30 nm fibre forms in vitro and may form in some specific in-vivo contexts (heterochromatin in particular), but that the dominant interphase fold is more polymer-physical than crystalline.
Hi-C and the genome-wide contact map
Chromosome conformation capture (3C, Dekker et al. 2002) crosslinks chromatin in vivo, digests with a restriction enzyme, ligates spatially proximal fragments, and quantifies the resulting junction frequencies. Hi-C generalises 3C to a genome-wide contact matrix: after crosslinking and digestion, the proximity-ligated junctions are biotin-tagged, sheared, captured, and paired-end sequenced to produce a contact frequency matrix at megabase to kilobase resolution. Lieberman-Aiden et al. 2009 introduced Hi-C and described two principal findings: contact-frequency-versus-genomic-distance scaling consistent with a fractal-globule rather than equilibrium-globule polymer, and the partitioning of the genome into A and B compartments.
A/B compartments
Principal-component analysis of the Hi-C matrix at 1 Mb resolution decomposes the genome into two compartments. The A compartment is gene-dense, transcriptionally active, GC-rich, replicates early in S phase, and contacts other A regions across the chromosome and between chromosomes. The B compartment is gene-poor, transcriptionally inactive, AT-rich, late-replicating, often associated with the nuclear lamina, and contacts other B regions. Compartment identity correlates with chromatin marks (H3K4me1/3, H3K27ac for A; H3K9me2/3, lamin association for B) and is largely cell-type-specific at the boundaries.
Topologically associating domains (TADs)
At sub-megabase resolution, the contact matrix decomposes into self-interacting domains: TADs. Dixon et al. 2012 identified ~2,200 TADs in mouse embryonic stem cells with a median size of ~880 kb and showed that TAD boundaries are largely conserved across cell types and partially conserved with human syntenic regions. Nora et al. 2012 independently identified the same domain organisation at the X-inactivation centre, with the boundary between the Xist-encompassing and Tsix-encompassing domains controlling the cis-regulatory landscape.
TADs are largely produced by cohesin-mediated loop extrusion: the cohesin ring loads onto chromatin and extrudes a progressively larger loop until it encounters a convergent CTCF site, where it is stalled. The result is a population of dynamic loops with anchors at convergent CTCF sites and a TAD-like contact pattern in the population average. Mutation of CTCF anchor sites or depletion of cohesin abolishes TAD structure and can alter cis-regulatory contacts — the mechanism behind several developmental phenotypes in which TAD-boundary disruption brings normally separated enhancers and promoters into contact. Bonev and Cavalli 2016 review the integrated picture.
A polymer-physics view
Polymer physics provides a quantitative scaffold. The contact-frequency-versus-genomic-distance scaling P(s) decays as a power law whose exponent depends on the polymer state: ~−1.5 for an equilibrium-globule random walk, ~−1 for a fractal-globule (the experimentally observed regime in Hi-C above ~500 kb), and ~−2 for a self-avoiding walk. Loop-extrusion simulations reproduce TADs, the s−1 scaling regime, and the loss of TAD structure under cohesin depletion. The polymer model has displaced the older purely-hierarchical view: chromatin is now treated as a heterogeneous polymer that adopts characteristic statistical conformations rather than a rigid hierarchy of fibres.
Why this matters for pedigree-modelling teaching
Disruption of chromatin architecture — through TAD-boundary deletion, structural variant rearrangement of CTCF anchors, or topoisomerase deficiency — can alter cis-regulatory contacts and drive disease phenotypes that look misleadingly like simple coding mutations on a pedigree. A working understanding of the architecture is therefore part of the interpretive context for any inherited-condition discussion. Evagene's pedigree drawing tool and Mendelian inheritance calculator are educational and research tools; outputs are illustrative for teaching and study only. See the companion pages on DNA replication and repair and genome structure and variation.
Key references
- Watson JD, Crick FHC. A structure for deoxyribose nucleic acid. Nature 171:737–738 (1953). PMID 13054692.
- Luger K et al. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389:251–260 (1997). PMID 9305837.
- Wang JC. DNA topoisomerases. Annu Rev Biochem 65:635–692 (1996). PMID 8811193.
- Lieberman-Aiden E et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289–293 (2009). PMID 19815776.
- Dixon JR et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485:376–380 (2012). PMID 22495300.
- Nora EP et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485:381–385 (2012). PMID 22495304.
- Bonev B, Cavalli G. Organization and function of the 3D genome. Nat Rev Genet 17:661–678 (2016). PMID 27739533.
- SantaLucia J Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS 95:1460–1465 (1998). doi:10.1073/pnas.95.4.1460.
- Dekker J et al. Capturing chromosome conformation. Science 295:1306–1311 (2002). PMID 11847345.
Frequently asked questions
How do Watson–Crick and Hoogsteen base pairs differ?
Watson–Crick pairs use the canonical edges and form the standard B-form duplex. Hoogsteen pairs use the major-groove edge of the purine, which must flip into the syn glycosidic conformation. They appear as a few-percent population in normal duplex DNA and are enriched at lesions and in triplex / quadruplex structures.
What does Lk = Tw + Wr mean?
For a closed circular duplex, the linking number Lk — a topological invariant — equals the sum of the twist Tw and the writhe Wr. Topoisomerases change Lk by transient breakage and re-ligation.
What is a topologically associating domain?
A self-interacting genomic region (typically several hundred kilobases to a few megabases) within which contacts are enriched and across whose boundaries contacts are depleted. Identified by Hi-C in 2012 (Dixon et al., Nora et al.).
Is the 30 nm fibre real?
It forms in vitro under specific conditions and is a real structural state. Whether it dominates interphase chromatin in nuclei is contested; cryo-EM tomography and ChromEMT support a more disordered 5–24 nm polymer in vivo.
What is Hi-C?
A chromosome-conformation-capture method that produces a genome-wide contact map by crosslinking, restriction digestion, proximity ligation, and paired-end sequencing of the resulting junctions.