Cis- and trans-regulatory elements: enhancers, silencers, insulators, and the transcription-factor toolkit
An educational guide to the DNA elements that locally control transcription — promoters, enhancers, silencers, insulators, and super-enhancers — and to the trans-acting transcription factors that bind them. Covers position weight matrices, JASPAR, ChIP-seq for transcription-factor mapping, ATAC-seq and DNase-seq for chromatin accessibility, expression QTLs, and how to interrogate ENCODE and GTEx for cis-regulatory annotation.
Short version. Cis-regulatory elements are non-coding DNA sequences on the same chromosome as the gene they regulate. Four canonical classes — promoters, enhancers, silencers, and insulators — together with super-enhancers, define the local DNA architecture of transcriptional control. Trans-acting transcription factors recognise short DNA motifs (captured as position weight matrices in databases like JASPAR), and the in-vivo occupancy and accessibility of these elements is mapped at genome scale by ChIP-seq, ATAC-seq, and DNase-seq. ENCODE provides the integrated catalogue; GTEx provides the population-scale eQTL view.
What "cis" and "trans" mean
The terminology is inherited from classical genetics. A cis-regulatory element acts on a gene located on the same DNA molecule (the same chromosome). A trans-acting factor — typically a diffusible protein or RNA — can act on any target sequence anywhere in the genome, regardless of which chromosome encoded the factor. An autoregulatory transcription factor binds in trans to a cis-element on its own gene, completing a feedback loop. The cis/trans distinction maps cleanly onto the two genetic-experiment paradigms by which regulatory architecture is interrogated: cis-elements are revealed by sequence-level mutations that affect a single nearby gene; trans-acting factors are revealed by mutations that affect many genes in coordinated ways.
Promoters
The promoter is the cis-element at which the basal transcription machinery assembles. For RNA polymerase II in metazoans, this includes the core promoter (often containing a TATA-box, BRE, INR, DPE, or some combination), where TFIID and the rest of the general transcription factor complex anchors. Many promoters are CpG-island associated and lack a canonical TATA-box, instead recruiting the polymerase via interactions with sequence-specific transcription factors bound in the proximal promoter. RNA-seq and CAGE (cap analysis of gene expression) provide complementary views of promoter activity at single-base resolution; CAGE peaks define active transcription start sites.
Enhancers
Enhancers are cis-regulatory sequences that activate transcription of a target gene from a distance, often tens or hundreds of kilobases away, with no requirement for a fixed orientation or position. The first enhancer was characterised by Banerji and colleagues in the SV40 viral genome (Banerji et al. 1981, Cell 27:299), where a 72-bp repeat upstream of the early promoter increased transcription regardless of where in the construct it was placed. The metazoan enhancer concept generalised rapidly: by the 1990s tissue-specific enhancers had been mapped for many genes, and by the 2000s the typical mammalian gene was understood to be regulated by multiple enhancers acting combinatorially.
The modern enhancer is a packed regulatory element: it binds a combination of cell-type-specific transcription factors, recruits the Mediator complex, makes physical contact with its target promoter via three-dimensional chromatin looping, and is itself often transcribed into short, unstable enhancer RNAs (eRNAs). Active enhancers are marked by H3K4me1 and H3K27ac in the surrounding chromatin and are accessible (DNase- or ATAC-positive). Levine, Cattoglio, and Tjian (2014, Cell 157:13) reviewed the biology comprehensively and argued for the central role of long-range looping in enhancer function.
Super-enhancers
Whyte and colleagues (Whyte et al. 2013, Cell 153:307) and Hnisz and colleagues (Hnisz et al. 2013, Cell 155:934) characterised super-enhancers as clusters of enhancer elements with exceptionally high densities of transcription factor and coactivator binding (notably Mediator and BRD4). Super-enhancers tend to mark cell-identity genes (master transcription factors, pluripotency factors in embryonic stem cells, lineage-defining factors in differentiated cells), are highly cell-type specific, and are unusually sensitive to perturbations that disrupt the cooperative binding architecture (BRD4 inhibition with JQ1 was the prototypical demonstration). The super-enhancer concept has been clarified and contested in subsequent work, but the empirical observation — that some loci attract many-fold higher binding densities than typical enhancers — is robust.
Silencers
Silencers are cis-elements that repress transcription of a target gene, mirroring the enhancer in directionality but inverted in effect. Mechanistically silencers recruit repressive transcription factors and chromatin modifiers (such as Polycomb complexes or HDACs), establishing a repressive chromatin environment over the target. Silencers are less well catalogued at genome scale than enhancers because the standard active-mark signatures (H3K27ac, accessible chromatin) are absent or muted; massively parallel reporter assays and CRISPR-interference tiling screens are now being used to map silencer activity systematically.
Insulators and CTCF
Insulators block cis-regulatory contacts that cross them. They prevent enhancers in one chromatin neighbourhood from inappropriately activating promoters in an adjacent neighbourhood. The mammalian insulator protein is CTCF, a zinc-finger transcription factor that binds sequences with a defined motif throughout the genome. Cohesin, the ring-shaped complex that holds sister chromatids together during mitosis, also extrudes chromatin loops in interphase; the loops anchor at convergent CTCF sites, producing the characteristic "loop domains" visible in Hi-C contact maps. The architecture is reviewed in Phillips and Corces (2009, Cell 137:1194); the loop-extrusion mechanism is treated in the companion epigenetics and chromatin dynamics page. CTCF binding sites are now annotated at high resolution across cell types in ENCODE.
Transcription-factor binding motifs and JASPAR
A transcription factor recognises a short DNA sequence — typically 6 to 20 bp — with some tolerance for variation. The standard mathematical representation of motif preference is the position weight matrix (PWM) or position-specific scoring matrix (PSSM), which gives, for each position in the motif, the log-likelihood ratio of each base relative to a background. PWMs are derived empirically from collections of bound sequences (originally curated; now, predominantly, from ChIP-seq peak sequences) and are catalogued in JASPAR (Castro-Mondragon et al. 2022, Nucleic Acids Res 50:D165). JASPAR 2022 catalogued motifs for hundreds of transcription factors across multiple taxa and is the standard reference for motif scanning in non-coding sequence interpretation.
Motif scanning a candidate regulatory region against JASPAR identifies plausible transcription-factor binding sites, but motif occurrence alone is a poor predictor of in-vivo occupancy — chromatin accessibility, cooperative binding, and cofactor availability all matter. Motif analysis is therefore typically combined with experimental data: ChIP-seq peaks are scanned for the cognate motif to localise the binding event within the peak; differential ATAC-seq peaks are tested for enrichment of motifs to nominate the transcription factors driving the change.
ChIP-seq, ATAC-seq, and DNase-seq
The experimental toolkit for mapping cis-regulatory elements has three pillars. ChIP-seq uses a transcription-factor-specific or histone-modification-specific antibody to immunoprecipitate cross-linked chromatin fragments bearing the target, then sequences the recovered DNA. The result is a genome-wide map of where the target was bound at the moment of cross-linking. ChIP-seq for sequence-specific transcription factors localises binding to within tens of base-pairs around the cognate motif; ChIP-seq for histone modifications identifies broader regulatory regions.
ATAC-seq (Buenrostro et al. 2013, Nat Methods 10:1213) uses a hyperactive Tn5 transposase loaded with sequencing adapters; the transposase preferentially inserts adapters into accessible (nucleosome-free) DNA, producing a library that, when sequenced, reveals open chromatin at base-pair resolution. ATAC-seq requires far less input material than DNase-seq (~50,000 cells, with single-cell variants down to one), is faster than DNase-seq, and has become the standard chromatin-accessibility assay. DNase-seq uses DNase I to cleave accessible DNA preferentially; it remains the historical reference for accessible-chromatin mapping and is still used where deep coverage and high signal-to-noise are required.
ENCODE and GTEx as cis-regulatory data resources
ENCODE (ENCODE Project Consortium 2012, Nature 489:57) is the canonical integrated atlas of cis-regulatory annotation for the human and mouse genomes. The candidate cis-Regulatory Elements (cCRE) registry classifies several million regions by chromatin signature into categories such as "promoter-like", "proximal enhancer-like", "distal enhancer-like", and "CTCF-only". For any genomic region, the cCRE registry provides an immediate hypothesis about its likely regulatory role; ENCODE also provides per-element ChIP-seq, ATAC-seq, DNase-seq, and histone-modification tracks across hundreds of cell types and tissues.
GTEx (GTEx Consortium 2020, Science 369:1318) provides a complementary population-scale view. GTEx genotyped and RNA-sequenced 49 human tissues across hundreds of donors, and computed expression quantitative trait loci (eQTLs) and splicing QTLs (sQTLs). For a given gene in a given tissue, GTEx returns the cis-regulatory variants (within 1 Mb of the gene) whose genotype correlates with expression level. Combining ENCODE annotation with GTEx eQTL data is the standard pipeline for assigning a non-coding variant to a candidate regulatory mechanism: the variant is checked for overlap with an annotated cCRE in the relevant tissue, motif disruption is assessed against JASPAR, and the eQTL evidence is checked for causality of expression effect.
Why this matters for pedigree-level analysis
Family-history-based pedigree modelling does not directly model cis-regulatory variation, but the existence of regulatory variation underlies several phenomena that pedigree analysis routinely encounters: variable expressivity, where individuals with the same coding variant differ in phenotypic severity, often reflects modifier alleles in regulatory regions; reduced penetrance has the same root cause; the contribution of common-variant polygenic risk in multifactorial disease is dominated by regulatory rather than coding variation. Polygenic risk scores in hereditary cancer risk assessment are weighted sums over hundreds to thousands of regulatory variants, the bulk of which fall in cCREs annotated by ENCODE.
Frequently asked questions
What is a cis-regulatory element?
A non-coding DNA sequence on the same chromosome as the gene it regulates that influences that gene's transcription. Promoters, enhancers, silencers, and insulators are the four canonical classes.
What is a super-enhancer?
A cluster of enhancer elements occupied by exceptionally high densities of transcription factors and coactivators (notably Mediator and BRD4). Whyte et al. 2013 and Hnisz et al. 2013 are the foundational descriptions.
How does CTCF establish regulatory boundaries?
CTCF binds defined DNA motifs and, with cohesin, anchors chromatin loops that bring enhancers into contact with target promoters and prevent contact across loop boundaries. Phillips and Corces 2009 reviews the architecture.
What does ATAC-seq measure?
Accessible chromatin, mapped by transposing sequencing adapters into nucleosome-free DNA. Buenrostro et al. 2013, Nat Methods 10:1213, is the foundational paper.
What is an eQTL?
An expression quantitative trait locus — a DNA sequence variant whose genotype correlates with the expression level of a gene. The GTEx Consortium catalogued eQTLs across 49 human tissues.
Is this a clinical resource?
No. Evagene is an academic, research, and educational pedigree modelling platform. This page is educational content for students, researchers, and educators; it is not medical advice and does not constitute clinical decision support.