Pedigree analysis and variable expression
Real pedigrees rarely fit the textbook patterns cleanly. Reduced penetrance, variable expressivity, age-dependent penetrance, somatic and germline mosaicism, pleiotropy, locus and allelic heterogeneity, and phenocopies all complicate inheritance-pattern recognition. This page covers each, with worked loci, and walks through the standard Bayesian arithmetic for refining a carrier probability given pedigree information. For research, education, and teaching.
Short version. A pedigree drawn in NSGC notation is the input to inheritance-pattern recognition and to Bayesian carrier-probability arithmetic. Recognising departures from the textbook pattern — reduced penetrance, variable expressivity, age-dependent penetrance, mosaicism, locus heterogeneity, phenocopies — is the analytic skill the page covers. The arithmetic combines a population-derived prior with conditional information from the pedigree to produce a posterior probability; the standard textbook example is the X-linked recessive carrier probability given unaffected sons.
Pedigree analysis fundamentals: NSGC notation
The reference for the symbol set is the National Society of Genetic Counselors (NSGC) Pedigree Standardization Work Group, originally published as Bennett et al. 1995 in Journal of Genetic Counseling 4:267-279 (PMID 24234812), updated as Bennett et al. 2008 (J Genet Couns 17:424-433, PMID 18792771), and most recently revised as Bennett et al. 2022 to incorporate sex-and-gender-inclusive notation (J Genet Couns 31:1238-1248, PMID 36106433). The 2022 update separates sex assigned at birth (encoded by the shape) from gender identity (encoded by symbol annotation), and is the current reference for new pedigree work; see NSGC pedigree notation and gender-inclusive pedigree drawing.
Three fundamentals of pedigree analysis matter for everything that follows. First, the proband — the individual through whom the family came to attention — is marked with an arrow; ascertainment bias is the systematic effect of the proband's affected status on the apparent rate of affected individuals in the family, and Bayesian carrier-probability arithmetic must condition on the proband's status. Second, the minimum useful pedigree is three generations from the proband; two-generation patterns are almost always uninformative. Third, the pedigree is the input to formal segregation analysis, linkage analysis, and the family-history risk-model algorithms (BRCAPRO, MMRpro, PancPRO, Tyrer-Cuzick, BOADICEA via the CanRisk file bridge). The drawing tool is at pedigree drawing tool; the chart conventions are at pedigree chart.
Reduced penetrance
Penetrance is the probability that an individual carrying a particular genotype expresses the associated phenotype within the observation window. A penetrance of 1.0 means every carrier is affected; reduced penetrance means a fraction of carriers are unaffected at the time of observation. Reduced penetrance is the single most important reason that real autosomal dominant pedigrees fail to show vertical transmission across every generation: an apparently "skipped" generation is often a non-penetrant carrier rather than a true non-carrier.
Worked examples: BRCA1-associated breast cancer has approximately 60-70% penetrance by age 80 (Antoniou et al. 2003, Am J Hum Genet 72:1117) — meaning that a third of BRCA1 carriers do not develop breast cancer in their lifetime. HNF1A in MODY3 has approximately 60-70% penetrance by age 25. RYR1 in malignant hyperthermia susceptibility has even lower penetrance, because the phenotype is conditional on exposure to triggering anaesthetics. Reduced penetrance has direct consequences for pedigree analysis: an unaffected obligate carrier should not be misclassified as a non-carrier on the basis of phenotype alone, and the recurrence-risk arithmetic must use the conditional probability of affected status given carrier status, not the unconditional ratio.
Variable expressivity versus reduced penetrance
Variable expressivity is the variation in phenotype severity, organ involvement, or symptom combination among individuals with the same genotype. It is distinct from reduced penetrance: a fully penetrant condition may show extreme variable expressivity, and a partially penetrant condition may show stereotyped expression in those who are penetrant. Neurofibromatosis type 1 (NF1) is the canonical example of variable expressivity: penetrance is essentially complete by age 5, but the combination of café-au-lait macules, axillary or inguinal freckling, neurofibromas, optic glioma, Lisch nodules, learning difficulty, and skeletal abnormalities varies dramatically between members of the same family carrying the same NF1 variant. Marfan syndrome (FBN1) is similar: cardiovascular, skeletal, and ocular features in different proportions in different individuals.
Three mechanisms underlie variable expressivity: modifier loci elsewhere in the genome, environmental exposures, and stochastic developmental processes (the same individual sometimes shows the phenotype on one side of the body and not the other — for example unilateral vestibular schwannoma in NF2).
Age-dependent penetrance
Several conditions show penetrance that is a strong function of age. Huntington disease (HTT CAG repeat expansion) has near-zero penetrance by age 20 and approaches 1.0 by age 70; the precise age-of-onset curve is a function of repeat length. BRCA1 and BRCA2 breast and ovarian cancer risk increases steadily across decades, with the age-stratified penetrance curves underlying every BRCAPRO and BOADICEA computation. Familial adenomatous polyposis (APC) shows polyp burden and colorectal cancer risk that rise steeply through the third and fourth decades of life. Age-dependent penetrance has direct consequences for pedigree analysis: a young unaffected obligate carrier of a late-onset condition cannot be reclassified on the basis of being currently unaffected, and the "has not yet expressed" arithmetic must use the carrier's current age-stratified conditional probability of being unaffected given the genotype.
Somatic mosaicism
Somatic mosaicism is the presence of two or more genetically distinct cell populations within an individual, arising from a post-zygotic mutation. The earlier in development the mutation occurs, the larger the fraction of cells affected, and the more body tissues involved. Somatic mosaicism explains the partial or asymmetric phenotypes seen in conditions where a fully constitutional mutation would be lethal: the segmental forms of NF1, McCune-Albright syndrome (GNAS, post-zygotic activating mutation, with café-au-lait, fibrous dysplasia, and endocrine hyperfunction in a mosaic distribution), Proteus syndrome (AKT1 p.Glu17Lys post-zygotically). Detection of somatic mosaicism requires deep sequencing of multiple tissues and is increasingly the explanation for previously unexplained de novo phenotypes.
Germline mosaicism — worked example
Germline mosaicism is the presence of a mutation in a fraction of the gametes of an apparently unaffected (and somatically uninvolved) parent. Germline mosaicism is the explanation for the empirical observation that two affected siblings can be born to two unaffected parents in conditions where the underlying mutation appears, on parental sequencing, to be absent.
The textbook worked example is paternal germline mosaicism in osteogenesis imperfecta type II (lethal perinatal form, COL1A1 or COL1A2 dominant negative variants; OMIM 166210). The classical paper is Cohn et al. 1990 (Am J Hum Genet 46:591) which documented recurrence of OI type II in two pregnancies of an apparently unaffected couple, with the COL1A1 variant detectable in the father's sperm at low allele fraction but absent from his blood. The empirical observation drove the introduction of an explicit germline-mosaicism term into recurrence-risk counselling for autosomal dominant lethal conditions: even when neither parent is somatically affected, the recurrence risk after one affected child is empirically of order 5-7%, not zero. The arithmetic for the recurrence-risk computation given germline mosaicism is at germline mosaicism calculator; see also osteogenesis imperfecta pedigree.
Pleiotropy, locus heterogeneity, allelic heterogeneity, phenocopies
Four further phenomena complicate pedigree analysis:
- Pleiotropy. A single gene affects multiple phenotypic features, often in apparently unrelated organ systems. Marfan syndrome (cardiovascular, skeletal, ocular features all from FBN1) is the standard example.
- Locus heterogeneity. The same phenotype is produced by variants at any of several different loci. Hereditary non-syndromic deafness has more than 100 known loci; retinitis pigmentosa has more than 80; Lynch syndrome / hereditary non-polyposis colorectal cancer is associated with germline variants in MLH1, MSH2, MSH6, PMS2, and EPCAM. A pedigree consistent with autosomal dominant transmission gives no information about which locus is responsible without molecular testing. See Lynch syndrome risk calculator for one example.
- Allelic heterogeneity. Many different variants at the same locus produce the same or related phenotypes. CFTR has over 2,000 reported variants, BRCA1 over 3,000; the variant catalogue is curated in ClinVar.
- Phenocopies. Affected individuals in a family do not all share the genetic cause. A breast cancer in a woman who is not a BRCA1 carrier in a BRCA1 family is a phenocopy — a sporadic case occurring in a high-risk family by chance. Phenocopy probability is informed by the population incidence of the phenotype.
Bayesian risk calculation in pedigree analysis
The standard textbook framework for pedigree-based risk computation is Bayes's rule applied as a before-and-after probability table. The framework was systematised in genetic-counselling textbooks (Murphy & Chase, Principles of Genetic Counseling, 1975; Young, Introduction to Risk Calculation in Genetic Counseling, current standard text), with earlier antecedents in Edwards 1960 (Acta Genet Stat Med 10:63) and the wider statistical-genetics literature.
Worked example: X-linked recessive carrier probability
Consider a woman whose maternal uncle has Duchenne muscular dystrophy (X-linked recessive, DMD). Her mother is therefore an obligate carrier with prior probability close to 1 (subject to the de novo argument; we will set that aside for the worked example). The woman herself has a prior probability of 1/2 of being a carrier, inherited from her mother. The woman has three unaffected sons.
The Bayesian table:
| Hypothesis | Carrier (Cc) | Non-carrier (CC) |
|---|---|---|
| Prior probability | 1/2 | 1/2 |
| Conditional probability of three unaffected sons | (1/2)3 = 1/8 | 1 |
| Joint probability | 1/16 | 1/2 = 8/16 |
| Posterior probability | 1/9 | 8/9 |
The conditional column captures the information added by the pedigree: a non-carrier mother transmits no DMD alleles to any son (probability of three unaffected sons given non-carrier = 1), whereas a carrier mother has a 1/2 chance of transmitting per son (probability of three unaffected sons given carrier = 1/8). The posterior probability of carrier status, given three unaffected sons, drops from the prior of 1/2 to 1/9. The arithmetic generalises: each additional unaffected son multiplies the conditional probability of carrier-status by 1/2 and the posterior of carrier-status by approximately 1/2. The interactive computation, with adjustments for de novo events and partial penetrance, is at carrier probability calculator; the wider Mendelian arithmetic is at mendelian inheritance calculator.
Hardy-Weinberg-derived priors
For autosomal recessive priors, the carrier frequency in a population is approximately 2√q(1-q) ~ 2q where q is the disease-allele frequency, under Hardy-Weinberg equilibrium. The prior probability that a randomly sampled individual from a population is a carrier of an autosomal recessive condition is therefore 2pq where p + q = 1. In a counselling scenario, the prior on the partner of a known carrier of cystic fibrosis (CFTR) is the population-specific carrier frequency for CFTR — approximately 1 in 25 in northern European populations, 1 in 65 in African populations, lower in East Asian populations. Population-specific allele frequencies are tabulated in gnomAD and are the source of the population-stratified prior in any computation.
For autosomal dominant priors, the relevant population parameter is the disease-allele frequency itself, and for X-linked priors the carrier frequency in females is 2q. De novo rates — particularly relevant for X-linked lethal conditions — are an empirical input. For DMD, the standard de novo / inherited equilibrium argument gives an a priori expectation that approximately 1/3 of cases are de novo (the Haldane equilibrium), and the Bayesian table can be extended to incorporate this.
Knudson's two-hit model in pedigree analysis
For dominantly inherited cancer-predisposition syndromes, Knudson's two-hit model (Knudson 1971, PNAS 68:820, PMID 5279523) provides the framework that connects germline carrier status to lifetime cancer risk. The first hit is the inherited variant; the second hit is a somatic event in the affected tissue. The two-hit model explains the apparent "dominant" transmission of recessive-at-the-cellular-level tumour suppressors (RB1, TP53, BRCA1, BRCA2, NF1, NF2, APC, VHL, the Lynch-syndrome MMR genes), the age-of-onset distribution (the second hit takes time), and the unilateral / bilateral asymmetry seen in retinoblastoma (bilateral retinoblastoma is the carrier signature). The two-hit model is the framework that underlies BRCAPRO, MMRpro, and PancPRO, and is summarised in the standard genetics textbook Genes in Medicine (Wright et al.).
Putting it together
Pedigree analysis combines symbol-set documentation (NSGC), pattern recognition (the patterns covered on inheritance patterns), and Bayesian arithmetic for refining recurrence and carrier-status probabilities given the observed pedigree. The challenge in real families is that several non-textbook mechanisms are usually at play simultaneously: a BRCA1 family typically shows reduced penetrance, age-dependent penetrance, and at least one phenocopy, all of which need to be accommodated in the recurrence-risk arithmetic. Risk-model implementations — BRCAPRO, MMRpro, PancPRO — are the systematic application of Bayesian arithmetic across thousands of pedigree configurations; the family-history scoring criteria (Manchester for BRCA, Amsterdam II and Bethesda for Lynch, Vasen 1999, Umar 2004) are heuristic shortcuts to the same posterior. Each of these is implemented in Evagene as illustrative / for-research / for-teaching computation; the IBIS-style approximation of Tyrer / Duffy / Cuzick 2004 used by Evagene is not the official IBIS Breast Cancer Risk Evaluator binary, and BOADICEA is licensed by the University of Cambridge and is not bundled in Evagene — Evagene exports a CanRisk 2.0 pedigree file for upload at canrisk.org.
Evagene is an academic, research, and educational pedigree modelling platform; outputs are illustrative and for educational and research purposes only. Disease-specific pedigree pages collect the worked examples by condition: achondroplasia pedigree, Duchenne muscular dystrophy pedigree, osteogenesis imperfecta pedigree, hereditary cardiac pedigree, Dravet syndrome pedigree, imprinting and UPD pedigree.
Key references
- Bennett RL, Steinhaus KA, Uhrich SB, et al. 1995. Recommendations for standardized human pedigree nomenclature. J Genet Couns 4:267-279. PMID 24234812.
- Bennett RL, French KS, Resta RG, Doyle DL. 2008. Standardized human pedigree nomenclature: update and assessment of the recommendations of the National Society of Genetic Counselors. J Genet Couns 17:424-433. PMID 18792771.
- Bennett RL, French KS, Resta RG, Austin J. 2022. Practice resource-focused revision: standardized pedigree nomenclature update centered on sex and gender inclusivity. J Genet Couns 31:1238-1248. PMID 36106433.
- Knudson AG. 1971. Mutation and cancer: statistical study of retinoblastoma. PNAS 68:820-823. PMID 5279523.
- Edwards JH. 1960. The simulation of mendelism. Acta Genet Stat Med 10:63-70.
- Murphy EA, Chase GA. 1975. Principles of Genetic Counseling. Year Book Medical Publishers.
- Wright CF, FitzPatrick DR, Firth HV. Genes in Medicine: A Practical Guide.
- Antoniou A, Pharoah PDP, Narod S, et al. 2003. Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations. Am J Hum Genet 72:1117-1130. PMID 12677558.
- gnomAD — Genome Aggregation Database. gnomad.broadinstitute.org.
- ClinVar. ncbi.nlm.nih.gov/clinvar.
Related Evagene pages
- Mendelian genetics and gene discovery — pillar
- Inheritance patterns
- Mapping and gene identification
- Germline mosaicism calculator
- Carrier probability calculator
- Mendelian inheritance calculator
- Consanguinity calculator
- Lynch syndrome risk calculator
- Pedigree drawing tool
- Pedigree chart
- NSGC pedigree notation
- Achondroplasia pedigree
- Duchenne muscular dystrophy pedigree
- Osteogenesis imperfecta pedigree
- Hereditary cardiac pedigree
- Dravet syndrome pedigree
- Imprinting and UPD pedigree