RNA Processing and Stability — Splicing, Polyadenylation, Decay

Q: What is the spliceosome?

The spliceosome is a megadalton ribonucleoprotein machine that excises introns from pre-mRNA and ligates the flanking exons. The major (U2-dependent) spliceosome is built from five small nuclear ribonucleoproteins — U1, U2, U4, U5 and U6 snRNPs — together with around 100 additional protein factors. A minor U12-dependent spliceosome handles a small subset of atypical introns.

Q: What is the chemistry of splicing?

Splicing proceeds by two transesterification reactions. In the first step, the 2'-OH of the branch-point adenosine attacks the 5' splice site, generating a free 5' exon and a lariat intermediate in which the intron is linked to the branch-point A by a 2'-5' bond. In the second step, the 3'-OH of the free 5' exon attacks the 3' splice site, joining the two exons and releasing the intron as a lariat. Both steps are RNA-catalysed; U6 snRNA holds the catalytic metal ions.

Q: What is nonsense-mediated decay?

Nonsense-mediated decay (NMD) is a cytoplasmic mRNA surveillance pathway that degrades transcripts with a premature termination codon. In mammals, the canonical signature is a stop codon located more than ~50 nucleotides upstream of the last exon-exon junction, marked by the exon junction complex; UPF1, UPF2, UPF3, and the SMG kinases coordinate target recognition and degradation. NMD also serves as a regulatory pathway, controlling the abundance of a sizeable subset of normal transcripts.

Q: Where does mRNA decay take place?

Cytoplasmic mRNA decay typically begins with shortening of the poly(A) tail (deadenylation) by the CCR4-NOT and PAN2-PAN3 complexes. Deadenylated transcripts are then either decapped by DCP2 and degraded 5' to 3' by XRN1, or degraded 3' to 5' by the cytoplasmic exosome. Specialised pathways — nonsense-mediated, no-go, and non-stop decay — handle aberrant transcripts.

Short version. A pre-mRNA leaving Pol II is capped at the 5' end, has its introns excised by the spliceosome, is cleaved at a defined point and given a poly(A) tail at the 3' end, then exported. In the cytoplasm the mature mRNA is translated, eventually deadenylated, decapped, and degraded by exonucleases; aberrant messages are caught by surveillance pathways including nonsense-mediated decay. Most steps are co-transcriptional and most steps are regulated; alternative splicing alone affects more than 90% of human multi-exon genes and is the largest single source of transcript-level diversity. This page covers each step in turn.

Educational positioning. This page is written for students, educators, and researchers. Evagene is an academic, research, and educational pedigree modelling platform; it is not a medical device, not clinical decision support, and not a diagnostic or screening tool.

5' capping

The first processing event happens when the nascent transcript is around 25 nucleotides long. The 5' triphosphate is hydrolysed to a diphosphate by an RNA triphosphatase; a guanylyltransferase adds a GMP in an inverted 5'-5' triphosphate linkage; an N7-methyltransferase methylates the added guanine to give the cap-0 structure (m7Gppp-N). In higher eukaryotes the first transcribed nucleotide is also 2'-O-methylated to give cap-1, and the second to give cap-2.

The cap is recognised in the nucleus by the cap-binding complex (CBC, comprising CBP80 and CBP20) and in the cytoplasm by eIF4E. It protects the transcript from 5' to 3' exonucleolytic decay, supports the first round of splicing through CBC contacts with U1 and U6 snRNPs, contributes to nuclear export, and serves as the loading point for cap-dependent translation initiation. Capping enzymes are recruited to the elongating Pol II by the Ser5-phosphorylated CTD — the canonical example of co-transcriptional coupling on the Pol II CTD code.

The spliceosome

Most metazoan protein-coding genes are split into exons separated by introns; the average human gene has eight or nine exons and an average intron length of several kilobases. Splicing is the excision of introns and ligation of flanking exons, performed by the spliceosome. The major U2-dependent spliceosome processes more than 99% of mammalian introns; the minor U12-dependent spliceosome handles a small set of atypical introns. The spliceosome and its mechanism are reviewed in Will and Lührmann (2011, Cold Spring Harbor Perspectives in Biology 3:a003707).

The major spliceosome is assembled de novo on each intron from five small nuclear ribonucleoprotein particles (snRNPs) and around 100 additional protein factors. The five snRNPs are U1, U2, U4, U5, and U6, each containing a uridine-rich snRNA and a set of associated proteins. Sequential assembly proceeds through complexes E, A, B, B*, and C: U1 binds the 5' splice site; U2 binds the branch point in an ATP-dependent step; the U4/U6.U5 tri-snRNP joins; U1 and U4 are released, U6 base-pairs with the 5' splice site, and the catalytic centre is formed; the two transesterification steps occur; the spliced exons are released and the intron lariat is debranched.

Splicing chemistry

Splicing proceeds by two transesterification reactions:

Step 1 (branching): the 2'-OH of the branch-point adenosine attacks the phosphate at the 5' splice site, breaking the upstream exon-intron bond and forming a 2'-5' phosphodiester linkage between the branch-point A and the first nucleotide of the intron — the lariat intermediate. The 5' exon is released with a free 3'-OH.
Step 2 (exon ligation): the 3'-OH of the free 5' exon attacks the phosphate at the 3' splice site, joining the two exons into a single mature mRNA backbone and releasing the intron as a lariat.

Both reactions are RNA-catalysed; the catalytic core is built around U6 snRNA, which holds two divalent metal ions (typically Mg²⁺) coordinated to the leaving group and the nucleophile. The spliceosome is a ribozyme. Once released, the intron lariat is debranched by DBR1 and degraded.

Splice-site signals and branch-point recognition

The spliceosome recognises three short sequence elements at each intron:

The 5' splice site: the consensus GU at the 5' end of the intron (the GT-AG rule), embedded in a longer consensus that base-pairs with U1 snRNA.
The branch point: an adenosine in a YNYURAC consensus, typically 18-40 nucleotides upstream of the 3' splice site, recognised by U2 snRNP through base-pairing with U2 snRNA.
The 3' splice site: a polypyrimidine tract followed by a terminal AG.

These signals are short and degenerate, which is why splice-site selection is heavily dependent on auxiliary regulatory factors. SR proteins bind exonic splicing enhancers and recruit U1 / U2AF; hnRNPs bind exonic and intronic splicing silencers and antagonise U1 / U2AF binding. The combinatorial output of these positive and negative regulators tunes splice-site choice in a tissue- and cell-state-specific manner. The general regulatory logic is reviewed in Black (2003, Annual Review of Biochemistry 72:291).

Constitutive vs alternative splicing

A constitutively spliced exon is included in essentially every mature transcript; an alternatively spliced exon is included in some transcripts and skipped in others, generating distinct isoforms from a single gene. The major modes of alternative splicing are exon skipping (cassette exons), alternative 5' or 3' splice-site choice, mutually exclusive exons, and intron retention. RNA-seq surveys put the prevalence of alternative splicing in human multi-exon genes at over 90% (Wang et al. 2008, Nature 456:470; Pan et al. 2008, Nature Genetics 40:1413).

Alternative splicing is regulated by tissue-specific splicing factors (e.g. NOVA, RBFOX, PTBP, MBNL, CELF families), by chromatin state (H3K36me3 marks influence exon inclusion), and by the kinetic interplay with Pol II elongation rate. Slower elongation tends to favour proximal splice-site choice and exon inclusion, faster elongation the opposite. Splicing dysregulation underlies a large class of human disease: spinal muscular atrophy (SMN1 / SMN2 splicing), familial dysautonomia (IKBKAP intron 20), myotonic dystrophy (MBNL1 sequestration), and many more. Splice-modulating antisense oligonucleotides (nusinersen for SMA, eteplirsen for DMD) are an established therapeutic modality.

3' cleavage and polyadenylation

The 3' end of a Pol II transcript is generated by a coupled cleavage and polyadenylation reaction. The substrate signals are the AAUAAA hexamer (typically 10-30 nucleotides upstream of the cleavage site), a U-rich or GU-rich downstream sequence element, and auxiliary upstream sequence elements. The protein machinery comprises:

CPSF (cleavage and polyadenylation specificity factor): recognises AAUAAA through the CPSF-30 / WDR33 subunits; CPSF-73 is the endonuclease that performs the cleavage.
CstF (cleavage stimulation factor): binds the downstream U/GU-rich element.
CFI / CFII (cleavage factors I and II): contribute to substrate recognition and cleavage-site positioning.
Poly(A) polymerase (PAP): adds the poly(A) tail to the cleaved 3' end, with PABPN1 stimulating processivity once the tail reaches a threshold length.

The mature poly(A) tail is around 200-250 nucleotides in mammals. Many human genes have multiple polyadenylation signals, and the choice between them is regulated, generating alternative polyadenylation (APA). APA changes 3' UTR length, which alters miRNA-binding and RNA-binding-protein landscapes and thereby alters mRNA stability, localisation, and translation efficiency. The biology of APA is reviewed in Tian and Manley (2017, Nature Reviews Molecular Cell Biology 18:18).

Cleavage and polyadenylation are tightly coupled to Pol II termination: the Ser2-phosphorylated CTD recruits CPSF and CstF; once the polyadenylation signal is transcribed, cleavage destabilises the elongation complex and triggers Pol II termination by the "torpedo" or "allosteric" mechanisms.

Nuclear export

Mature mRNAs are exported from the nucleus through the nuclear pore complex by the TREX (transcription-export) machinery. TREX components are recruited co-transcriptionally and during splicing; the export adaptor ALYREF / THOC4 binds the mRNA and hands it to the heterodimeric export receptor NXF1-NXT1 (TAP-p15), which translocates the mRNP through the pore. The exon junction complex (EJC), deposited 20-24 nucleotides upstream of each exon-exon junction by the splicing machinery, contributes to export competence and also marks the transcript for the first round of translation, where it is interrogated by the NMD machinery.

Cytoplasmic mRNA decay

Once in the cytoplasm, an mRNA has a finite half-life. Bulk mRNA decay typically begins with progressive shortening of the poly(A) tail (deadenylation) by the PAN2-PAN3 complex (early phase) and the CCR4-NOT complex (late phase). A deadenylated transcript is then channelled into one of two pathways:

5' to 3' decay: decapping by DCP2, with cofactors EDC4, DDX6, and the LSM1-7 ring; the resulting 5'-monophosphate body is degraded by the cytoplasmic exonuclease XRN1.
3' to 5' decay: the cytoplasmic exosome (a 3' to 5' exoribonuclease complex with a barrel of nine non-catalytic subunits and the catalytic DIS3 / DIS3L / EXOSC10 nucleases) degrades the transcript from the deadenylated 3' end inward; the residual capped fragment is hydrolysed by the scavenger decapping enzyme DCPS.

Specific pathways include:

Nonsense-mediated decay (NMD): degrades mRNAs with a premature termination codon. In mammals, the canonical signature is a stop codon located more than ~50 nucleotides upstream of the last exon-exon junction (and therefore upstream of an EJC). UPF1 is the central effector; UPF2 and UPF3 are activators; SMG1 phosphorylates UPF1; SMG5/7 and SMG6 trigger decay. NMD also acts as a regulatory pathway, controlling the steady-state abundance of a sizeable subset of normal transcripts. The pathway is reviewed in Lykke-Andersen and Jensen (2015, Nature Reviews Molecular Cell Biology 16:665).
No-go decay (NGD): clears mRNAs that stall ribosomes during elongation; effectors include Pelota / Hbs1L (eukaryotic homologues of the bacterial RF/EF system) and ribosome-associated quality-control factors.
Non-stop decay (NSD): clears mRNAs that lack an in-frame stop codon, typically through ski7-mediated recruitment of the cytoplasmic exosome.
miRNA-mediated decay: Argonaute proteins loaded with a microRNA recruit the GW182 / TNRC6 scaffold, which in turn recruits CCR4-NOT to drive deadenylation, decapping, and degradation of the targeted transcript.

The integrated picture is one of competing pathways whose balance determines transcript half-life: surveillance pathways prune aberrant messages, regulatory pathways tune the steady state, and the kinetics of decay shape the dynamics of the response of the cell to any change in transcription.

Regulation summary

Every step covered on this page is regulated. Capping is constitutive but cap structure (cap-0 vs cap-1 vs cap-2) influences innate-immune sensing. Splicing is regulated by SR proteins, hnRNPs, tissue-specific factors, chromatin state, and Pol II elongation rate. Polyadenylation site choice is regulated by core machinery levels and auxiliary factors and shapes 3'-UTR-dependent stability and localisation. Export is gated by TREX engagement. Decay is regulated by RNA-binding proteins (HuR, TTP, AUF1) and microRNAs that recruit the deadenylation machinery, and by surveillance factors that recognise translation-coupled signals.

Frequently asked questions

What is the spliceosome?

A megadalton ribonucleoprotein machine assembled from U1, U2, U4, U5 and U6 snRNPs that excises introns from pre-mRNA.

What is the chemistry of splicing?

Two transesterification reactions: branch-point attack on the 5' splice site to form a lariat intermediate, then 5' exon attack on the 3' splice site to ligate the exons.

How widespread is alternative splicing in humans?

More than 90% of human multi-exon genes show evidence of alternative splicing (Wang 2008, Pan 2008).

What is nonsense-mediated decay?

A cytoplasmic surveillance pathway that degrades transcripts with a premature termination codon, recognised through the EJC-marked junction signature; UPF1 is the central effector.

Where does mRNA decay take place?

In the cytoplasm. Deadenylation by CCR4-NOT and PAN2-PAN3, followed by decapping and 5' to 3' degradation by XRN1, or 3' to 5' degradation by the cytoplasmic exosome.

RNA processing and stability: splicing, polyadenylation, export, and decay