Transcriptional machinery: RNA polymerases, transcription factors, and co-transcriptional processing
A subtopic guide on the eukaryotic transcriptional machinery: the three RNA polymerases and their substrates; the structure of Pol II as solved by Cramer and colleagues; the carboxy-terminal domain code that orchestrates co-transcriptional processing; the core promoter elements; the general transcription factors and the Mediator complex; the major transcription-factor DNA-binding domain families; and enhancer-promoter contact through chromatin looping.
Short version. Eukaryotes have three nuclear RNA polymerases. Pol I transcribes ribosomal RNA in the nucleolus; Pol II transcribes the protein-coding transcriptome plus most regulatory RNAs; Pol III transcribes tRNAs, 5S rRNA, and a handful of short non-coding RNAs. Pol II's carboxy-terminal domain — an unusual repetitive tail unique among polymerases — is phosphorylated through the transcription cycle to coordinate co-transcriptional capping, splicing, and 3'-end processing. The core promoter recruits the general transcription factors and Pol II at the start site; transcription factors bound at enhancers contact the promoter through chromatin looping; the Mediator complex bridges enhancer-bound factors and Pol II. This page covers each layer in turn.
Educational positioning. This is an educational page written for students, educators, and researchers. Evagene is an academic, research, and educational pedigree modelling platform; it is not a medical device, not clinical decision support, and not a diagnostic or screening tool.
The three eukaryotic RNA polymerases
Eukaryotic nuclei contain three multi-subunit DNA-dependent RNA polymerases, each transcribing a distinct subset of the genome:
- RNA polymerase I (Pol I): localised to the nucleolus, transcribes a single substrate — the 45S ribosomal RNA precursor — from the tandemly repeated rDNA arrays. The 45S precursor is processed into the mature 28S, 18S, and 5.8S rRNAs that, together with the Pol III-transcribed 5S rRNA, build the ribosome. Pol I transcription accounts for the majority of cellular RNA output by mass.
- RNA polymerase II (Pol II): transcribes all protein-coding genes, most snRNAs (U1, U2, U4, U5; U6 is a Pol III substrate), most long non-coding RNAs (lncRNAs), and the primary transcripts of most miRNAs. Pol II transcription is the most heavily regulated and the focus of this page.
- RNA polymerase III (Pol III): transcribes the transfer RNAs, the 5S ribosomal RNA, U6 snRNA, 7SL RNA, 7SK RNA, and a small number of additional short non-coding RNAs. Pol III promoters are unusual in that they sit, for many genes, downstream of the transcription start site, internal to the transcribed RNA itself.
Each polymerase has its own dedicated set of general transcription factors and its own promoter architecture. The three polymerases share a conserved core architecture — ten of their twelve subunits have homologous counterparts across all three — reflecting their common evolutionary origin.
Pol II structure
The atomic structure of yeast Pol II was solved by Cramer, Bushnell and Kornberg (2001, Science 292:1863), with subsequent transcribing-elongation complex structures from the same group (Gnatt, Cramer, Fu, Bushnell and Kornberg 2001) showing the polymerase poised on a DNA-RNA hybrid with a nucleotide entering the active site. The structure clarified how the clamp closes around the DNA template, how the bridge helix and trigger loop coordinate nucleotide addition, and how the RNA exit channel routes the nascent transcript out of the polymerase. Roger Kornberg received the 2006 Nobel Prize in Chemistry for this body of work.
Pol II is a 12-subunit complex; its largest subunit, Rpb1, carries an unusual carboxy-terminal domain, the CTD, that is unique to Pol II among the three nuclear polymerases.
The CTD code
The Pol II CTD consists of tandem heptad repeats with the consensus YSPTSPS — 52 copies in humans, 26 in yeast. Five of the seven residues can be reversibly phosphorylated; in practice the dominant marks are phospho-Ser5 (early in the cycle, around initiation) and phospho-Ser2 (later, during elongation), with phospho-Ser7, phospho-Tyr1, and phospho-Thr4 contributing additional regulatory marks. The pattern of phosphorylation is read by sets of factors that engage Pol II at the appropriate stage:
- Initiation: TFIIH-dependent phosphorylation of Ser5 recruits the capping enzymes to the nascent transcript.
- Elongation: P-TEFb-dependent phosphorylation of Ser2 recruits splicing factors and the histone-modifying machinery that travels with the elongating polymerase.
- Termination: a Ser2-phosphorylated CTD also recruits the cleavage and polyadenylation machinery at 3'-ends.
The model that the CTD acts as a moving platform for co-transcriptional processing is the "CTD code" or "CTD cycle" reviewed in Buratowski (2009, Molecular Cell 36:541). The phosphorylation pattern is dynamic: kinases (CDK7 / TFIIH, CDK9 / P-TEFb, CDK12) and phosphatases (FCP1, SSU72) sculpt it through the cycle. The CTD is essential for viability in every organism in which it has been tested.
Promoter architecture
The Pol II core promoter is the stretch of DNA, typically about 80 bp around the transcription start site (TSS), that contains the elements recognised by the general transcription factors. The most studied core elements, reviewed in Smale and Kadonaga (2003, Annual Review of Biochemistry 72:449):
- TATA box: a TATA(A/T)A(A/T) consensus, located about 25-30 bp upstream of the TSS in TATA-containing promoters; bound by the TBP subunit of TFIID.
- Initiator (INR): a pyrimidine-rich element overlapping the TSS; bound by TFIID and supporting transcription start-site selection.
- Downstream promoter element (DPE): located ~28-32 bp downstream of the TSS in many TATA-less promoters; binds TFIID and acts in combination with the INR.
- TFIIB recognition element (BRE): flanks the TATA box and contributes to TFIIB binding.
- Motif Ten Element (MTE): a downstream element working with the INR.
- CpG islands: roughly 60-70% of human Pol II promoters lie within CpG islands — G+C-rich, CpG-rich stretches of DNA that are typically unmethylated and that lack a TATA box. CpG-island promoters often have multiple, dispersed transcription start sites.
Most human promoters are TATA-less and CpG-island-associated. The TATA-containing, sharply-defined-TSS promoter is the textbook archetype but is not the most common type in mammals.
General transcription factors and the pre-initiation complex
Pol II does not bind DNA productively on its own. Productive transcription requires the assembly of the pre-initiation complex (PIC) on the core promoter, comprising Pol II and the general transcription factors:
- TFIID: a multi-subunit complex containing TATA-binding protein (TBP) and 13 or so TBP-associated factors (TAFs). TFIID is the first factor at the promoter; TBP bends the TATA-containing DNA and the TAFs contact INR / DPE / MTE elements.
- TFIIA and TFIIB: stabilise TBP-DNA contacts and bridge TBP to Pol II.
- TFIIF: travels with Pol II and helps load it onto the promoter.
- TFIIE: regulates the catalytic activity of TFIIH.
- TFIIH: a 10-subunit complex with two enzymatic activities — a DNA helicase (XPB / XPD) that opens the promoter and a CDK7-cyclin H kinase that phosphorylates the CTD on Ser5. TFIIH is also a core subunit of the nucleotide excision repair pathway, which is why mutations in XPB / XPD cause xeroderma pigmentosum, Cockayne syndrome, and trichothiodystrophy.
Once the PIC is assembled, ATP-dependent DNA opening by TFIIH gives Pol II access to single-stranded template DNA, the polymerase initiates RNA synthesis, the CTD is phosphorylated on Ser5, and Pol II escapes the promoter.
The Mediator complex
Mediator is a large multi-subunit complex — about 30 subunits in metazoans — that bridges sequence-specific transcription factors bound at enhancers and the Pol II initiation machinery at the promoter. It is divided structurally into the head, middle, tail, and CDK module; the tail interacts with transcription-factor activation domains, the head and middle contact Pol II and the general transcription factors, and the CDK module (CDK8-cyclin C, MED12, MED13) modulates kinase activity. The complex is reviewed in Allen and Taatjes (2015, Nature Reviews Molecular Cell Biology 16:155).
Mediator is a general coregulator of essentially all Pol II transcription. Its conformation and subunit composition are tuned by the regulatory inputs it integrates, which is why subunit-specific phenotypes (Cornelia de Lange syndrome, Opitz-Kaveggia syndrome, certain cancers) can map to specific Mediator subunits.
Transcription-factor DNA-binding domains
Sequence-specific transcription factors bind enhancer / promoter DNA through compact DNA-binding domains. The four most common families:
- Helix-turn-helix (HTH): the founding family of structurally characterised DNA-binding domains. The homeodomain (a 60-amino-acid HTH variant) is the canonical metazoan HTH; HOX genes, OCT factors, and many others use it.
- Zinc finger: most commonly the Cys2-His2 ("classical") finger, which coordinates a Zn ion and presents an alpha-helix into the DNA major groove. Tandem fingers read consecutive DNA triplets. The human transcription-factor catalogue has more zinc-finger proteins than any other family. Other zinc-coordinated families include the Cys4 nuclear receptor domain (e.g. the oestrogen and androgen receptors) and the GATA-type fingers.
- Leucine zipper (bZIP): a basic region that contacts DNA and a coiled-coil dimerisation segment with a leucine every seventh residue. AP-1 (Fos-Jun), CREB, and ATF factors are bZIPs.
- Helix-loop-helix (HLH) and basic HLH (bHLH): a basic DNA-contacting region followed by two amphipathic helices separated by a loop, with the helices mediating dimerisation. MyoD, c-Myc, the E-proteins, and the achaete-scute family are bHLH factors.
Each family combines a particular mode of major-groove contact with a particular mode of dimerisation; combinatorial pairing of dimer partners is one mechanism by which a relatively small set of structural folds generates a large set of DNA-sequence specificities.
Enhancers and enhancer-promoter looping
Enhancers are short cis-regulatory DNA segments — typically a few hundred base pairs, tens to hundreds of kilobases from the gene they regulate — that bind transcription factors and increase transcription of a target promoter. The term was coined for an SV40 element shown to act independently of distance, orientation, and position relative to the gene, and the principles generalise to metazoan genomes. Enhancers function by physical contact with the promoter through chromatin looping, mediated by structural factors (cohesin, CTCF) and transcriptional coactivators (Mediator, BRD4); their activity is integrated by the Pol II PIC at the target promoter.
An overview of enhancers and their developmental logic is given by Levine, Cattoglio and Tjian (2014, Cell 157:13). The mechanisms of enhancer-promoter contact — loop extrusion by cohesin, boundary insulation by CTCF, the formation of topologically associating domains (TADs), and the dynamics of contact at the single-allele level — are reviewed by Schoenfelder and Fraser (2019, Nature Reviews Genetics 20:437). Genome-scale data on enhancer location come from chromatin-state mapping (ENCODE, Roadmap Epigenomics) and from chromatin-conformation capture methods (3C, Hi-C, Capture-C, Micro-C).
Co-transcriptional processing
Transcription does not happen in isolation. The capping enzymes engage the nascent RNA as soon as the 5'-end is roughly 25 nucleotides long, recruited by the Ser5-phosphorylated CTD. The spliceosome assembles on the elongating transcript as introns emerge, with kinetic coupling to elongation rate — slower elongation can favour proximal splice-site choice, and the pace of Pol II can therefore tune alternative splicing. Cleavage and polyadenylation factors are loaded onto the Ser2-phosphorylated CTD during elongation and engage the polyadenylation signal as it is transcribed, terminating the polymerase. The detail of capping, splicing, and 3'-processing is covered on RNA processing and stability.
Regulation in summary
Pol II transcription is regulated at every stage: PIC assembly, promoter escape, promoter-proximal pausing (and release by P-TEFb), elongation rate, and termination. Transcription factors bind enhancers; coactivators (Mediator, p300/CBP, BRD4) integrate the inputs; the Pol II PIC at the promoter is the integration point. The picture that emerges from the last two decades of structural, biochemical, and genomic work is of a polymerase whose activity is shaped at every step by a combinatorial network of inputs — sequence-specific factors, chromatin state, three-dimensional genome organisation, and the CTD code that links the polymerase to the co-transcriptional processing machinery.
Frequently asked questions
How many RNA polymerases do eukaryotes have?
Three: Pol I (rRNA), Pol II (mRNA and most regulatory RNAs), Pol III (tRNA, 5S rRNA, U6 snRNA, and other short non-coding RNAs).
What is the Pol II CTD?
The carboxy-terminal heptad-repeat tail of the largest Pol II subunit. Phosphorylated dynamically through the transcription cycle to coordinate capping, splicing, and 3'-end processing.
What is the difference between a core promoter and an enhancer?
The core promoter is the DNA sequence around the TSS that recruits the general transcription factors. Enhancers are distal regulatory sequences that bind transcription factors and modulate transcription level by physical contact with the promoter through chromatin looping.
What is the Mediator complex?
A large multi-subunit complex that bridges enhancer-bound transcription factors and the Pol II initiation machinery at the promoter. A general coregulator of Pol II transcription.
What are the main transcription-factor DNA-binding domain families?
Helix-turn-helix (homeodomains), zinc finger (Cys2-His2 and others), leucine zipper (bZIP), helix-loop-helix (bHLH).