GEDCOM and Pedigree Software: Bridging Genealogy and Clinical Genetics
Genealogy and clinical genetics share a common foundation: the family. For decades, genealogists have meticulously recorded lineages, births, marriages, and migrations. In parallel, clinical geneticists have constructed pedigrees to track the inheritance of disease, assess risk, and guide management decisions. The GEDCOM file format sits at the intersection of these two disciplines, providing a standardised way to exchange family structure data between genealogy platforms and clinical pedigree tools. This article explores what GEDCOM is, how it works, and why it matters for anyone working at the crossroads of family history and genetic medicine.
What Is GEDCOM?
GEDCOM, an acronym for Genealogical Data Communication, is a plain-text file format created in 1984 by The Church of Jesus Christ of Latter-day Saints. Its purpose was straightforward: provide a universal format for exchanging family history information between the growing number of genealogy software applications. Before GEDCOM, transferring a family tree from one programme to another meant re-entering every individual, date, and relationship by hand.
The format evolved through several revisions. GEDCOM 5.5, released in 1996, became the de facto standard. Its successor, GEDCOM 5.5.1, published in 1999, refined character encoding rules (adding UTF-8 support), clarified ambiguous tag definitions, and improved handling of multimedia references. Despite being over two decades old, GEDCOM 5.5.1 remains the most widely supported version across genealogy software. A newer specification, GEDCOM 7.0 (sometimes called FamilySearch GEDCOM 7), was released in 2021, but adoption has been gradual and many platforms still export exclusively in 5.5.1.
At its core, a GEDCOM file is a hierarchical text document. Each line begins with a level number (0, 1, 2, etc.), followed by an optional cross-reference identifier, a tag, and an optional value. The primary record types are:
- INDI (Individual) — represents a person, storing their name (NAME), sex (SEX), birth event (BIRT), death event (DEAT), and any number of additional events or attributes.
- FAM (Family) — represents a family unit, linking a husband (HUSB), wife (WIFE), and children (CHIL) through cross-references to INDI records.
- SOUR (Source) — documents the provenance of information, whether a census record, a birth certificate, or an oral family account.
- NOTE — free-text annotations that can be attached to any record or sub-record.
- OBJE (Object) — references to multimedia files such as photographs or scanned documents.
A typical GEDCOM file for a three-generation family might contain a few dozen INDI records, a handful of FAM records linking them, and associated events. Larger exports from platforms like Ancestry can contain tens of thousands of records spanning many generations and multiple branches.
Why GEDCOM Matters for Genetic Pedigrees
Clinical pedigrees and genealogical family trees capture much of the same structural information: who is related to whom, parent-child and spousal relationships, sex, dates of birth and death. The difference lies in the overlay. A genealogist annotates their tree with historical events, places, occupations, and source citations. A clinical geneticist annotates their pedigree with diagnoses, ages of onset, genetic test results, carrier status, and pregnancy outcomes using standardised pedigree notation.
GEDCOM provides the structural scaffold. A family tree exported from Ancestry, FamilySearch, or Gramps as a GEDCOM file already contains the individuals, their sexes, their birth and death dates, and the family relationships that link them. Importing this file into a clinical pedigree tool saves considerable data entry time, particularly for large or complex families. Instead of manually constructing a pedigree from a patient's verbal account (which is often incomplete or inconsistent), a clinician or genetic counsellor can start with a structurally accurate family tree and then add the clinical information.
This workflow is especially valuable when patients have already invested significant effort in building their family tree on a genealogy platform. Many patients referred for genetic counselling have extensive trees on Ancestry or FamilySearch, sometimes spanning five or more generations. Rather than asking them to recount their family structure verbally, the counsellor can request a GEDCOM export, import it, and focus the consultation on gathering medical history and discussing risk.
The bridge also works in the other direction. After a clinical pedigree has been constructed and annotated, the structural data can be exported back to GEDCOM for the patient to incorporate into their genealogy platform. This is particularly useful for families who wish to maintain a combined record of their ancestry and health history, even though the clinical annotations themselves are not part of the GEDCOM standard.
Importing GEDCOM Files into Pedigree Software
Importing a GEDCOM file into pedigree software is typically a straightforward process, but understanding what transfers well and what does not will help set expectations.
What transfers reliably
- Individual identity — names (given name, surname, maiden name), sex, and unique identifiers.
- Life events — birth date and place, death date and place, marriage date.
- Family structure — parent-child relationships, spousal links, sibling order (when encoded).
- Notes — free-text notes attached to individuals or families, which may contain informally recorded health information.
What typically does not transfer
- Clinical diagnoses — GEDCOM has no standard tag for medical conditions, ICD codes, or OMIM numbers.
- Genetic test results — variant classifications, gene names, and laboratory reports are outside the GEDCOM specification.
- Pedigree-specific symbols — carrier status, affected status, pregnancy details, and adoption symbols defined by the Pedigree Standardization Work Group have no GEDCOM equivalents.
- Risk scores — calculated values from models such as BRCAPRO, MMRpro, or PancPRO are generated within the pedigree tool and are not represented in GEDCOM.
Some GEDCOM files contain health-related information in NOTE fields or in custom (non-standard) tags. A well-designed import process should surface these notes for review, allowing the user to decide whether to incorporate them as clinical annotations. However, the absence of a structured medical vocabulary within GEDCOM means that any health data found in notes requires manual interpretation and classification.
Practical considerations
Large GEDCOM files can contain many individuals who are genealogically interesting but clinically irrelevant. A pedigree for Mendelian risk assessment typically focuses on three to four generations of the patient's direct lineage, not the entire extended tree. Good pedigree software should allow the user to select a proband and extract the relevant subset of the imported tree, pruning distant branches while preserving the core family structure.
Character encoding is another consideration. Older GEDCOM files may use ANSEL or ASCII encoding, whilst newer ones use UTF-8. Importing software needs to handle all three gracefully, particularly for names containing diacritical marks or non-Latin characters, which are common in international family histories.
Exporting Pedigrees to GEDCOM
Exporting a clinical pedigree to GEDCOM serves a different purpose from importing one. The primary use case is sharing the family structure with a family member who uses genealogy software. A patient might say, "Can I get a copy of my family tree to add to my Ancestry account?" In that context, a GEDCOM export is the natural answer.
The export process maps each individual in the pedigree back to a GEDCOM INDI record, each family unit to a FAM record, and each relevant event (birth, death, marriage) to the appropriate GEDCOM tag. Clinical data, by its nature, does not transfer. This is generally desirable from a privacy perspective: a patient sharing their tree with a cousin likely does not want to include genetic test results, risk assessments, or diagnostic details for other family members.
Selection-only export is a useful feature in this context. Rather than exporting the entire pedigree, the user selects specific individuals or branches to include in the GEDCOM file. This allows targeted sharing — for example, exporting only the maternal lineage for a relative who is researching that side of the family, without exposing the complete pedigree structure.
When exporting, the software must generate valid GEDCOM 5.5.1 syntax, including proper header records (HEAD), submitter records (SUBM), and a trailer (TRLR). Cross-references between INDI and FAM records must be internally consistent, and character encoding should be declared in the header. A poorly formed GEDCOM file will fail to import in the recipient's genealogy software, defeating the purpose of the exchange.
23andMe and Consumer Genomics Data
The rise of consumer genomics has added another dimension to family pedigrees. Services like 23andMe provide customers with raw genotype data files containing hundreds of thousands of SNP (single nucleotide polymorphism) genotypes. These files offer a molecular layer of information that can enrich a pedigree beyond what genealogical records alone can provide.
Understanding SNP data and rs numbers
A SNP is a variation at a single position in the genome. The human genome contains millions of known SNPs, each catalogued in the NCBI's dbSNP database with a unique identifier called an rs number (Reference SNP cluster ID). For example, rs8176719 is a well-characterised variant in the ABO gene that determines blood type. When 23andMe genotypes a customer, they assay a selected panel of these SNPs and report the results as a tab-delimited text file listing each rs number alongside the customer's genotype at that position.
Each genotype consists of two alleles (one inherited from each parent). For rs8176719, the possible genotypes include -- (homozygous deletion, associated with type O), -G (heterozygous), and GG (no deletion). By examining a cluster of SNPs in the ABO gene — including rs8176746 and rs8176747 — it is possible to infer whether the individual has blood type A, B, AB, or O.
Blood type inference from SNP markers
The ABO blood group system is determined by variants in the ABO gene on chromosome 9. The key SNPs are:
rs8176719— a single-nucleotide deletion that distinguishes the O allele from A and B alleles. The deletion introduces a frameshift that inactivates the glycosyltransferase enzyme.rs8176746andrs8176747— missense variants that distinguish the A allele from the B allele. The B allele carries specific nucleotide changes that alter the enzyme's substrate specificity.
By combining the genotypes at these positions, software can infer the ABO blood type with high accuracy. Rh factor (positive or negative) is determined by the RHD gene, with rs590787 serving as a proxy SNP on consumer genotyping arrays. Similarly, secretor status — whether ABO antigens are secreted into bodily fluids — can be inferred from the FUT2 gene variant rs601338, where a nonsense mutation (G428A) abolishes secretor activity.
It is important to note that SNP-based inferences are probabilistic. Rare alleles, population-specific variants, and the limited coverage of consumer arrays mean that some inferences may be incorrect. For clinical decision-making, serological confirmation of blood type and Rh status remains the standard of care.
Ancestry and trait information
Beyond blood type, 23andMe data provides ancestry composition estimates (percentage breakdowns of geographic ancestry), haplogroup assignments for maternal (mtDNA) and paternal (Y-DNA) lineages, and trait reports covering characteristics such as earwax type, asparagus metabolite detection, and cilantro aversion. When multiple family members have been genotyped, their data can be cross-referenced within a pedigree to confirm relationships, identify segments of shared DNA, and observe the segregation of specific alleles through generations.
How Evagene Handles GEDCOM and Genomics Data
Evagene is a web-based pedigree management system designed to serve both clinical and genealogical use cases. Its approach to GEDCOM and genomics data reflects the dual nature of its user base: genetic counsellors and clinicians who need standardised pedigree notation and risk models, and families who want to understand their health history in the context of their ancestry.
GEDCOM 5.5.1 import and export
Evagene's GEDCOM parser processes the full GEDCOM 5.5.1 specification, including INDI, FAM, SOUR, NOTE, and OBJE records. On import, the parser constructs an internal family graph from the GEDCOM data, resolving cross-references and mapping events to pedigree-relevant fields. Users can select a proband from the imported individuals, and Evagene renders the surrounding family structure as a clinical pedigree with standard notation.
For export, Evagene generates valid GEDCOM 5.5.1 files with proper header declarations, UTF-8 encoding, and internally consistent cross-references. The export supports selection-only mode, allowing users to choose specific individuals or branches for inclusion. This is particularly useful when a patient wishes to share their family structure with a relative without exposing the entire pedigree or any clinical annotations.
23andMe raw data import
Evagene accepts 23andMe raw data files (the tab-delimited text format provided when customers download their data). Upon import, the system scans for clinically relevant SNPs and performs automated inferences including ABO blood type, Rh factor, and secretor status. These inferred traits are attached to the corresponding individual in the pedigree and flagged as inferred from SNP data to distinguish them from clinically confirmed values.
Conflict resolution
When importing data from multiple sources (for example, a GEDCOM file and a 23andMe file for the same individual), conflicts may arise. A GEDCOM file might record a different birth date than the one inferred from the 23andMe account, or an existing clinical annotation might disagree with a SNP-inferred blood type. Evagene surfaces these conflicts to the user for resolution, presenting both values side by side and allowing the user to choose which to retain or whether to flag the discrepancy for further investigation.
Additional import formats
Beyond GEDCOM and 23andMe, Evagene supports several other import pathways. JSON import allows programmatic data exchange with other clinical systems. XEG (XML-based Exchange for Genetics) provides a structured format for pedigree data that includes clinical fields not present in GEDCOM. Image import via OCR enables users to upload photographs or scans of hand-drawn pedigrees, which Evagene processes using optical character recognition to extract individuals and relationships. This last feature is particularly valuable for digitising legacy paper pedigrees that exist only as photocopies or faxes in clinical files.
Full documentation on Evagene's data import and export capabilities, including step-by-step guides, is available at the Evagene help centre.
Genealogy Software Compatibility
GEDCOM's principal value lies in its universality. The format enables interoperability across a wide range of genealogy platforms, each of which has its own strengths and user base.
- Ancestry — the largest consumer genealogy platform, with over 30 million users and billions of historical records. Ancestry allows users to export their trees as GEDCOM 5.5.1 files from the tree settings menu. These files include individuals, relationships, events, and notes, but do not include Ancestry-specific features such as DNA matches, hints, or record attachments. The exported GEDCOM can be imported into any compatible pedigree tool for clinical annotation.
- FamilySearch — a free genealogy platform operated by The Church of Jesus Christ of Latter-day Saints. FamilySearch maintains a single shared world tree, and users can export portions of it as GEDCOM files. Because the tree is collaborative, exported data may include contributions from many users and should be reviewed for accuracy before clinical use.
- Gramps — an open-source genealogy programme available for Windows, macOS, and Linux. Gramps provides robust GEDCOM 5.5.1 import and export, along with a rich data model that supports custom attributes, which some users employ to record health information. Its open-source nature makes it a popular choice among technically inclined genealogists and researchers.
Other platforms that support GEDCOM include RootsMagic, Legacy Family Tree, MyHeritage, and MacFamilyTree. In each case, the GEDCOM file serves as the lingua franca that allows data to move between platforms and, crucially, into clinical pedigree software where it can be annotated with medical information and used for genetic risk assessment.
Frequently Asked Questions
What is a GEDCOM file?
GEDCOM (Genealogical Data Communication) is a plain-text file format developed by The Church of Jesus Christ of Latter-day Saints for exchanging genealogical data between different software applications. The current widely supported version is GEDCOM 5.5.1, which stores individuals, family relationships, events (births, deaths, marriages), notes, and source citations in a hierarchical tagged structure.
Can I import a GEDCOM file into pedigree software?
Yes. Most pedigree software that supports GEDCOM 5.5.1 can import .ged files exported from genealogy platforms such as Ancestry, FamilySearch, and Gramps. The import process maps GEDCOM individuals and family records to pedigree nodes, preserving names, dates, sex, and family relationships. Clinical details such as diagnoses and genetic test results are not part of the GEDCOM standard and typically require manual entry after import.
What data does GEDCOM 5.5.1 store?
GEDCOM 5.5.1 stores individual records (INDI) with name, sex, birth, death, and other life events; family records (FAM) linking spouses and children; source citations (SOUR); notes (NOTE); and multimedia references (OBJE). It uses a hierarchical line-based format with numbered levels and three- or four-character tags to identify each data element.
How does 23andMe data enhance a pedigree?
23andMe raw data files contain hundreds of thousands of SNP (single nucleotide polymorphism) genotypes. When imported into pedigree software, these genotypes can be used to infer blood type (ABO and Rh), secretor status, and certain heritable traits. The data can also confirm or refine family relationships and provide ancestry composition estimates that contextualise the pedigree geographically.
What are rs numbers in SNP data?
rs numbers (Reference SNP cluster IDs) are unique identifiers assigned by the NCBI dbSNP database to specific single nucleotide polymorphisms. For example, rs8176719 is associated with the ABO blood group gene. Each rs number identifies a specific position in the human genome where variation has been catalogued, allowing researchers and clinicians to reference the same variant unambiguously.
Can I export a clinical pedigree back to GEDCOM?
Yes, many pedigree tools support GEDCOM export. However, clinical data such as diagnoses, genetic test results, and risk scores are not part of the GEDCOM standard and will not transfer. The export typically includes names, dates, sex, and family structure. Some tools offer selection-only export, allowing you to choose which individuals to include when sharing with family members who use genealogy software.
Is GEDCOM compatible with Ancestry and FamilySearch?
Yes. Both Ancestry and FamilySearch support GEDCOM 5.5.1 import and export. Ancestry allows you to download your tree as a GEDCOM file from the tree settings page, and FamilySearch provides GEDCOM export for the portions of the tree you have contributed. These files can then be imported into clinical pedigree software for medical annotation and risk analysis.
What is the difference between a family tree and a clinical pedigree?
A family tree is primarily a genealogical record focused on ancestry, names, dates, and historical events. A clinical pedigree is a standardised medical diagram that uses specific symbols (defined by the National Society of Genetic Counselors and the Pedigree Standardization Work Group) to represent individuals, their relationships, and health information including diagnoses, carrier status, genetic test results, and pregnancy outcomes. GEDCOM bridges the two by providing the structural family data that clinical tools can then annotate with medical information.
Does Evagene support GEDCOM import?
Yes. Evagene supports full GEDCOM 5.5.1 import, parsing individual records, family groups, events, and notes. After import, the family structure is rendered as a clinical pedigree with standard genetic notation. Evagene also supports 23andMe raw data import for SNP-based trait inference, as well as JSON, XEG, and image-based import via OCR.
Can blood type be determined from 23andMe data?
Blood type can be inferred from 23andMe raw data by examining specific SNP markers in the ABO gene (e.g., rs8176719, rs8176746, rs8176747) and the RHD gene (e.g., rs590787). The inference determines ABO group (A, B, AB, or O) and Rh factor (positive or negative). Similarly, secretor status can be inferred from FUT2 gene variants (e.g., rs601338). These inferences are probabilistic and should be confirmed with serological testing for clinical purposes.