Gene expression mechanisms: from DNA to functional protein
A pillar overview of the four stages by which a eukaryotic gene becomes a functional protein: transcription, RNA processing, translation, and post-translational regulation. The page introduces the foundational papers — Crick on the central dogma, Berget, Moore and Sharp on mRNA splicing, Ban and colleagues on the ribosome — and signposts the three subtopic pages where each stage is covered in depth.
Short version. Gene expression is the process by which the information stored in a DNA sequence is decoded into a functional product, almost always a protein. In eukaryotes the journey has four stages: a polymerase reads the gene into pre-messenger RNA (transcription); the pre-mRNA is capped, spliced, polyadenylated, and exported (RNA processing); the ribosome reads the mature mRNA into a polypeptide (translation); and the polypeptide is folded, modified, sorted, and eventually degraded (post-translational regulation). Each stage is regulated, each stage is a source of disease when it goes wrong, and each stage has yielded Nobel Prizes for the people who first worked it out. This pillar introduces the four stages and the canonical literature; the three subtopic pages take each stage in depth.
Educational positioning. Evagene is an academic, research, and educational pedigree modelling platform. This page and its subtopic pages are written for students, educators, and researchers; they are not a medical device, not clinical decision support, and not a diagnostic or screening tool. The literature cited here is for teaching and reference.
The central dogma
The framework for thinking about gene expression is the central dogma of molecular biology, articulated by Francis Crick in 1958 and restated in his classic Nature paper, "Central dogma of molecular biology" (Crick 1970, Nature 227:561). The dogma is a statement about residue-by-residue sequence information: information flows from nucleic acid to nucleic acid (DNA to DNA, DNA to RNA, RNA to RNA, RNA to DNA) and from nucleic acid to protein, but never out of protein back into nucleic acid. The discovery of reverse transcription does not violate the dogma, because the RNA-to-DNA transfer is itself a sequence-information transfer that Crick's framework explicitly allows.
For gene expression in particular, the dogma supplies the spine of the process: a gene is transcribed from one DNA strand into pre-mRNA; the pre-mRNA is processed into mature mRNA; the mature mRNA is translated into a polypeptide; the polypeptide folds into a functional protein. Each arrow on that diagram is the substance of a subtopic page below.
Stage 1 · Transcription
Transcription is the synthesis of an RNA copy from a DNA template. In eukaryotes there are three RNA polymerases, each dedicated to a class of genes: RNA polymerase I transcribes the ribosomal RNA precursor in the nucleolus; RNA polymerase II transcribes all protein-coding genes plus most snRNAs and many lncRNAs and miRNAs; RNA polymerase III transcribes the tRNAs, the 5S rRNA, and a small set of additional short non-coding RNAs. The shared structural plan of the eukaryotic polymerases was solved at residue resolution by Cramer, Bushnell and Kornberg (2001, Science 292:1863), work that contributed to Roger Kornberg's 2006 Nobel Prize in Chemistry.
The Pol II carboxy-terminal domain (CTD) is a long heptad repeat (consensus YSPTSPS) whose phosphorylation pattern serves as a code that recruits RNA processing factors at each step of transcription. Capping enzymes engage early; splicing factors engage during elongation; cleavage and polyadenylation factors engage at termination. Promoter architecture, general transcription factors (TFIID, TFIIH and others), the Mediator complex, and enhancer-promoter looping are the regulators that determine which Pol II transcripts are made, when, and how much.
Coverage of these mechanisms in depth is on Transcriptional machinery: RNA polymerases, transcription factors, and co-transcriptional processing.
Stage 2 · RNA processing
The pre-mRNA emerging from Pol II is not the molecule that gets translated. It is capped at the 5' end with 7-methylguanosine, has its introns excised by the spliceosome, is cleaved at a defined point and polyadenylated at the 3' end, and is then exported from the nucleus. The discovery that eukaryotic genes are split — that mRNA is assembled from non-contiguous regions of DNA by removing intervening sequences — was made independently in 1977 by Berget, Moore and Sharp at MIT (PNAS 74:3171) and Chow, Gelinas, Broker and Roberts at Cold Spring Harbor (Cell 12:1), working on adenovirus. Phillip Sharp and Richard Roberts shared the 1993 Nobel Prize in Physiology or Medicine for the discovery of split genes.
Splicing is performed by the spliceosome, a megadalton ribonucleoprotein machine assembled from five small nuclear ribonucleoprotein particles (U1, U2, U4, U5, U6 snRNPs). The chemistry is two transesterification reactions; the substrate selectivity is encoded in the splice-site sequences and modulated by SR proteins, hnRNPs, and the kinetic interplay with elongating Pol II. Alternative splicing — the production of multiple transcript isoforms from a single gene by combinatorial splice-site selection — affects the great majority of human multi-exon genes (Wang et al. 2008, Nature 456:470; Pan et al. 2008, Nature Genetics 40:1413).
Cytoplasmic mRNA is also subject to surveillance and decay: deadenylation, decapping, exonucleolytic degradation, and nonsense-mediated decay (NMD) all shape the half-life of any given transcript and the response of the cell to misprocessed messages. Coverage in depth is on RNA processing and stability: splicing, polyadenylation, export, and decay.
Stage 3 · Translation
The mature mRNA is decoded into protein by the ribosome. In eukaryotes the ribosome is an 80S ribonucleoprotein complex assembled from a small (40S) and a large (60S) subunit. The atomic-resolution structures of the bacterial ribosome subunits, solved at the turn of the millennium by Venki Ramakrishnan, Tom Steitz, and Ada Yonath, transformed the field from a catalogue of biochemistry into a structural science. The large subunit was solved by Ban, Nissen, Hansen, Moore and Steitz (2000, Science 289:905); the small subunit by Wimberly, Brodersen, Clemons, Morgan-Warren, Carter, Vonrhein, Hartsch and Ramakrishnan (2000, Nature 407:327). The three shared the 2009 Nobel Prize in Chemistry "for studies of the structure and function of the ribosome".
Translation is divided into initiation, elongation, and termination. In the eukaryotic cap-dependent pathway, the eIF4F complex recognises the 5' cap, eIF2-GTP delivers the initiator methionyl-tRNA, and the small subunit scans for the start codon (Jackson, Hellen and Pestova 2010, Nature Reviews Molecular Cell Biology 11:113). Regulation of initiation, in particular through eIF2α phosphorylation in the integrated stress response and through mTOR-dependent control of eIF4E availability, is the dominant mode of acute translational control (Sonenberg and Hinnebusch 2009, Cell 136:731).
The most important quantitative method for studying translation today is ribosome profiling, introduced by Ingolia, Ghaemmaghami, Newman and Weissman (2009, Science 324:218): deep sequencing of the short mRNA fragments protected by translating ribosomes from nuclease digestion gives a codon-resolution map of where ribosomes are at any moment.
Coverage of translation, the chaperones, and post-translational modification is on Translation and post-translational control: ribosomes, chaperones, and protein quality control.
Stage 4 · Post-translational regulation
A polypeptide leaving the ribosome is not yet a functional protein. Folding is assisted by chaperones (Hsp70, Hsp90, the GroEL/GroES chaperonin), often co-translationally; covalent modifications — phosphorylation, acetylation, methylation, ubiquitination, glycosylation, SUMOylation — tune activity, localisation, and turnover; and the ubiquitin-proteasome system together with autophagy clear damaged or supernumerary proteins. Mis-handling of any of these processes underlies a swathe of human disease, from cystic fibrosis (a folding defect) to neurodegeneration (an aggregation defect).
Quantitative gene expression analysis
The dominant method for measuring gene expression is RNA sequencing (RNA-seq), introduced in mammalian transcriptomics by Mortazavi, Williams, McCue, Schaeffer and Wold (2008, Nature Methods 5:621). RNA-seq replaced microarrays as the default platform because it samples the full transcriptome, distinguishes splice isoforms, and captures novel transcripts and antisense expression. Single-cell RNA-seq, ribosome profiling, and spatial transcriptomics have extended the toolkit; large reference catalogues such as the Genotype-Tissue Expression (GTEx) project and the FANTOM atlases give tissue- and cell-type-level profiles of human gene expression for cross-referencing.
Why gene expression matters for human genetics
Mendelian inheritance — the topic of much of the rest of the Evagene site — is the inheritance of variants whose effect on phenotype runs through gene expression. A loss-of-function variant in a coding exon can truncate the protein; a splice-site variant can shift the reading frame; a 5' UTR variant can alter translation initiation; a promoter variant can change expression level; a missense variant in a chaperone-binding site can destabilise the folded product. The four stages on this page are the substrate on which Mendelian and complex-trait genetics operate.
For pedigree-based reasoning about how variants segregate within families, see Mendelian inheritance calculator and Autosomal dominant calculator; for the broader pedigree-drawing context, see Pedigree chart and Pedigree drawing tool.
Subtopic pages
- Transcriptional machinery — RNA polymerases I, II, III; the Pol II CTD code; promoters and core elements; general transcription factors and TFIID/TFIIH; the Mediator complex; transcription-factor DNA-binding domains; enhancer-promoter looping; co-transcriptional capping, splicing, and 3' processing.
- RNA processing and stability — spliceosome assembly and the two-step transesterification mechanism; constitutive vs alternative splicing; SR proteins and hnRNPs; cleavage and polyadenylation (CPSF, CstF, CFI/II); cap structure; nuclear export via TREX; cytoplasmic mRNA decay; nonsense-mediated decay.
- Translation and post-translational control — ribosome structure and function; cap-dependent vs IRES-mediated initiation; mTOR and the integrated stress response; ribosome profiling; co-translational folding; chaperone networks; ubiquitin-proteasome system and autophagy; the major post-translational modifications.
Frequently asked questions
What is the central dogma of molecular biology?
The principle, articulated by Crick in 1958 and restated in Nature in 1970, that residue-by-residue sequence information flows from nucleic acid to nucleic acid and from nucleic acid to protein, but never out of protein back into nucleic acid. Reverse transcription is consistent with the dogma.
What are the four stages of gene expression?
Transcription, RNA processing, translation, and post-translational regulation. Each is regulated; each is a source of disease when it fails; each has its own subtopic page.
Why was the discovery of mRNA splicing important?
It showed that eukaryotic genes are split into exons and introns, that the mature mRNA is assembled by removing introns, and that the eukaryotic transcriptome is shaped by splicing — underlying alternative isoform diversity. Sharp and Roberts shared the 1993 Nobel Prize for the work.
How is gene expression measured today?
RNA-seq is the dominant quantitative method, introduced in mammalian transcriptomics by Mortazavi et al. 2008. Single-cell RNA-seq and ribosome profiling extend the toolkit.
Is this page clinical advice?
No. This is an educational page. Evagene is an academic, research, and educational pedigree modelling platform; it is not a medical device, not clinical decision support, and not a diagnostic or screening tool.