Chromosome structure and mapping
A chromosome is the highly compacted package in which a linear DNA molecule is stored, replicated, and segregated. Two structural elements — the centromere and the telomeres — do most of the work that distinguishes a chromosome from a long piece of chromatin. This page covers the packaging hierarchy, the molecular biology of centromeres and telomeres, the cytogenetic and molecular techniques used to read chromosomes (G-banding, FISH, spectral karyotyping, chromosomal microarray), the ISCN nomenclature in which findings are recorded, and the T2T-CHM13 long-read assembly that finally completed the human reference.
Short version. A human chromosome is one continuous linear DNA molecule, packaged in nucleosomes, looped onto a protein scaffold, and folded into the rod-shaped metaphase chromosome that is photographed in karyotypes. The centromere is an epigenetically defined region marked by CENP-A nucleosomes, where the kinetochore assembles to attach to the spindle. Telomeres are TTAGGG repeats bound by the shelterin complex; they protect chromosome ends and are extended by telomerase, the enzyme whose discovery won the 2009 Nobel Prize. The chromosome is read at progressively higher resolution by G-banding (megabase scale), FISH (gene-locus scale), chromosomal microarray (kilobase scale), and short- and long-read sequencing (base scale). The 2022 T2T-CHM13 assembly completed the first end-to-end human reference.
Packaging: from DNA to metaphase chromosome
If the ~2 metres of DNA in a human cell were unwound, it would be roughly 250,000 times longer than the nucleus that contains it. The packaging that makes this geometrically possible is hierarchical:
- Nucleosome (~11 nm fibre). Approximately 146 base pairs of DNA wrap 1.65 times around a histone octamer (two each of H2A, H2B, H3, H4). Linker DNA of 20-80 base pairs separates one nucleosome from the next. This is the "beads on a string" structure visible by electron microscopy of unfolded chromatin.
- Higher-order chromatin folding. Nucleosome arrays fold into compact fibres and into loops anchored to a protein scaffold. The historical "30 nm fibre" model has been complicated by in-cellulo imaging suggesting more irregular and dynamic packing, but the principle — that nucleosome arrays form loops and domains — remains.
- Topologically associating domains (TADs) and compartments. Hi-C and related methods reveal that interphase chromatin organises into self-interacting domains and into A/B compartments correlating with active and inactive chromatin.
- Mitotic compaction. At entry into mitosis, condensin complexes (condensin I and II) extrude chromatin into loops and compact each chromatid into a stiff rod, while cohesin holds sister chromatids together. The metaphase chromosome occupies roughly 1/10,000 the volume of its linear DNA.
Each metaphase chromosome consists of two sister chromatids, joined at the centromere, with the short arm "p" above and the long arm "q" below by convention. The autosomes are numbered 1 to 22 in approximately decreasing size; the sex chromosomes are X and Y. The diploid count of 46 was established by Tjio and Levan in 1956, replacing a long-standing erroneous count of 48.
The centromere
The centromere is the region of a chromosome where sister chromatids are held together and where the kinetochore — the protein machine that captures spindle microtubules — is built. In humans, centromeric DNA consists of megabase-scale arrays of alpha-satellite (alphoid) repeats, organised into higher-order repeat units that are chromosome-specific. Centromeres are not, however, defined by DNA sequence alone. Cleveland and colleagues (2003) review the evidence that centromere identity is propagated epigenetically by the histone H3 variant CENP-A (also known as CENH3). CENP-A nucleosomes mark the active centromere and recruit a hierarchy of constitutive centromere-associated proteins (the CCAN), which in turn recruit the outer-kinetochore KMN network (KNL1, the Mis12 complex, the Ndc80 complex) at mitosis to make the actual microtubule attachments.
Cohesion between sister chromatids is established by cohesin, a ring-shaped complex (SMC1, SMC3, RAD21, SA1/SA2) loaded onto chromosomes during S phase. Cohesin holds sisters together until anaphase, when separase cleaves the RAD21 subunit and sisters are pulled apart. Loss of cohesion is one of the proposed contributors to age-related non-disjunction in human oocytes.
The telomere
Linear chromosomes have ends, and ends are a problem for two reasons. First, conventional DNA polymerases cannot fully replicate the lagging-strand 3' end, so each round of replication shortens the chromosome — the "end-replication problem". Second, a free DNA end looks like a double-strand break and would be repaired (badly) if not protected. Both problems are solved at the telomere.
Human telomeres are tandem repeats of the hexamer TTAGGG, extending several kilobases from each chromosome end and ending in a single-stranded G-rich overhang that folds back to form a "t-loop". The repeats and the overhang are bound by the shelterin complex (TRF1, TRF2, TIN2, RAP1, TPP1, POT1), reviewed in detail by de Lange (2005). Shelterin both protects the end from being recognised as damage and regulates access of telomerase, the reverse-transcriptase enzyme that adds new TTAGGG repeats and so compensates for end-replication loss. Greider and Blackburn's 1985 Cell paper reporting telomerase activity in Tetrahymena, together with Szostak's earlier yeast work, was recognised by the 2009 Nobel Prize in Physiology or Medicine.
Most somatic human cells express little or no telomerase; they undergo progressive telomere shortening with each division, and when telomeres become critically short they signal a DNA-damage response that drives the cell into replicative senescence (the "Hayflick limit"). Germline cells, stem cells, and most cancers maintain telomerase activity; a minority of cancers maintain telomeres by an alternative recombination-based mechanism (ALT). Inherited mutations affecting telomere maintenance components (e.g. TERT, TERC, DKC1) underlie the telomere biology disorders catalogued in OMIM, including dyskeratosis congenita.
Reading chromosomes by light microscopy: G-banding
The dominant cytogenetic technique for over fifty years is karyotype analysis by G-banding. Cultured cells (typically lymphocytes for a constitutional karyotype, or bone marrow for haematological neoplasms) are arrested in metaphase with colchicine or its analogue Colcemid, swollen in hypotonic solution, fixed, dropped onto slides, denatured briefly with trypsin, and stained with Giemsa. The trypsin pre-treatment differentially digests chromosomal proteins, producing the reproducible pattern of dark and light bands along each chromosome. Q-banding with quinacrine, introduced by Caspersson and colleagues in 1970, was the original demonstration that chromosomes carry reproducible longitudinal patterns; G-banding gives essentially the same pattern with light microscopy and a conventional stain.
Resolution is reported as the number of bands visible per haploid set in metaphase: 450-band karyotypes are routine, 550-band is standard for prenatal cytogenetics, and 700-band requires earlier mitotic arrest (prometaphase), giving more elongated chromosomes. Resolution of around 5-10 megabases is achievable; smaller imbalances are below the detection limit of any banding technique and require molecular methods.
Reading chromosomes by hybridisation: FISH and SKY
Fluorescence in situ hybridisation (FISH) uses fluorescently labelled DNA probes to detect specific sequences on chromosomes. Locus-specific probes detect the presence, absence, or rearrangement of a chosen genomic region (e.g. a deletion in 22q11.2, or a fusion signal across the t(9;22) translocation breakpoint). Centromeric probes count copies of a given chromosome. Painting probes label whole chromosomes for translocation analysis. Spectral karyotyping (SKY) and multiplex FISH (M-FISH) use combinatorial labelling to paint all 24 chromosome types in different colours simultaneously, making complex rearrangements much easier to interpret.
FISH offers higher resolution than banding for the regions covered by the probe (kilobase scale for locus-specific probes), but requires that the abnormality is in a region for which a probe has been chosen. It complements rather than replaces banding.
Reading chromosomes by array: aCGH and SNP arrays
Chromosomal microarray analysis — array comparative genomic hybridisation (aCGH) and SNP arrays — covers the genome in tens to hundreds of thousands of probes and detects copy-number gains and losses at sub-megabase resolution. aCGH compares test and reference DNA labelled with different dyes hybridised to a probe array; SNP arrays simultaneously type single-nucleotide polymorphisms, allowing detection of copy-neutral runs of homozygosity (relevant to consanguinity and uniparental disomy) as well as copy-number variation. The systematic association between recurrent copy-number variants and developmental phenotypes was established at scale by Cooper and colleagues (2011) in a morbidity map covering more than 15,000 cases. Microarray is now the recommended first-tier test in much of constitutional cytogenetics for unexplained intellectual disability or congenital anomalies, alongside or in place of karyotype.
Microarray cannot detect balanced rearrangements (which involve no net gain or loss of material), low-level mosaicism below ~10-20%, or single-base changes; for these, banding and sequencing remain complementary.
Reading chromosomes by sequencing
Short-read sequencing (Illumina) detects copy-number variation by read depth, structural variation by paired-end and split-read signals, and uniparental disomy by allelic imbalance. Long-read sequencing (PacBio HiFi, Oxford Nanopore) reads tens to hundreds of kilobases per molecule, resolving repeats and structural variants that short reads cannot span. Whole-genome sequencing increasingly substitutes for microarray as cost falls.
The most striking demonstration of the long-read era is the completion of a gapless human reference. Nurk et al. (2022) reported the T2T-CHM13 assembly of a homozygous hydatidiform mole — a genome with two identical haploid sets — sequenced with Oxford Nanopore ultra-long and PacBio HiFi reads. T2T-CHM13 added approximately 200 megabases of previously unassembled sequence to GRCh38, including all centromeric alpha-satellite arrays, the short arms of the acrocentric chromosomes, and the rDNA arrays. For the first time, every human chromosome had a complete, end-to-end sequence: telomere to telomere.
ISCN: the language for cytogenetic findings
An International System for Human Cytogenomic Nomenclature (ISCN) is the standardised language used to describe karyotypes, FISH, microarray, and sequencing-based cytogenomic findings. The current edition is ISCN 2024. ISCN provides:
- Format for karyotype designation: total chromosome count, sex chromosomes, abnormalities (e.g.
47,XX,+21for trisomy 21 in a female;46,XY,t(9;22)(q34;q11.2)for a male with the Philadelphia translocation). - Band-level addressing: chromosome arm + region + band + sub-band (e.g.
17q21.31). - Specific operators for translocations (
t), inversions (inv), deletions (del), duplications (dup), insertions (ins), isochromosomes (i), and ring chromosomes (r). - Microarray and sequence formats integrating coordinate-based descriptions with karyotype-style designations.
ISCN-format descriptions are interoperable across cytogenetic laboratories worldwide and are the basis on which findings can be searched, compared, and exchanged. Evagene's pedigree drawing tool stores cytogenetic findings in ISCN-format strings against the relevant pedigree node; see ISCN pedigree symbols for the way ISCN findings are integrated with NSGC pedigree notation, and the karyogram viewer for educational visualisation of named bands and rearrangements.
References
- Tjio JH, Levan A. The chromosome number of man. Hereditas 1956;42:1.
- Caspersson T, Zech L, Johansson C, Modest EJ. Identification of human chromosomes by DNA-binding fluorescent agents. Experimental Cell Research 1970;60:315.
- Greider CW, Blackburn EH. Identification of a specific telomere terminal transferase activity in Tetrahymena extracts. Cell 1985;43:405.
- Cleveland DW, Mao Y, Sullivan KF. Centromeres and kinetochores: from epigenetics to mitotic checkpoint signaling. Cell 2003;112:407.
- de Lange T. Shelterin: the protein complex that shapes and safeguards human telomeres. Genes & Development 2005;19:2100.
- Cooper GM, Coe BP, Girirajan S et al. A copy number variation morbidity map of developmental delay. Nature Genetics 2011;43:838.
- Nurk S, Koren S, Rhie A et al. The complete sequence of a human genome. Science 2022;376:44.
- ISCN 2024: An International System for Human Cytogenomic Nomenclature. Karger.
Related reading
- Chromosomes and cell division (pillar)
- Cell cycle, mitosis and meiosis
- Chromosomal abnormalities
- ISCN pedigree symbols
- Karyogram viewer
- NSGC pedigree notation
Evagene is an academic, research, and educational pedigree modelling platform. It is not a medical device, not clinical decision support, and not a diagnostic or screening tool. The cytogenetic concepts described here are taught at undergraduate and postgraduate level in genetics; the platform itself does not interpret cytogenetic findings for individual patient care.