GEDCOM to pedigree converter: GEDCOM 5.5.1 for clinical genetics

A technical guide to converting GEDCOM 5.5.1 genealogy files into clinical pedigree data — what the format carries, what needs re-annotation, what gets lost, and how Evagene's import, annotation, and round-trip export workflow handles the common cases.

| 12 min read

Short version. GEDCOM 5.5.1 is the de facto standard for exchanging pedigree structure between genealogy applications and clinical pedigree tools. It carries individuals, families, names, sex, birth and death events, and free-text notes cleanly. It does not carry structured clinical disease annotations, risk model outputs, HPO phenotypes, or AI interpretation — those are added on top after import. Evagene imports GEDCOM 5.5.1 and exports it back out for round-trip compatibility; the structural data is preserved exactly, and Evagene-specific annotations are encoded as notes where sensible or omitted when they have no GEDCOM representation. For an overview of GEDCOM beyond the technical plumbing, see GEDCOM pedigree software.

This page is for developers and clinicians who want to move pedigree data in and out of Evagene via GEDCOM cleanly, and want to know exactly what survives the round trip.

What GEDCOM is

GEDCOM (Genealogical Data Communication) is a text-based, line-oriented format developed by the Church of Jesus Christ of Latter-day Saints for exchanging genealogical data. GEDCOM 5.5.1 is the version most tools produce and consume. Newer 5.5.5 and 7.0 specifications exist; 5.5.1 remains the most interoperable. A GEDCOM file is a sequence of records (INDI for individuals, FAM for families, OBJE for media, etc.) with hierarchical tagged lines:

0 @I1@ INDI
1 NAME Jane /Smith/
1 SEX F
1 BIRT
2 DATE 14 MAR 1952
1 DEAT
2 DATE 22 JUN 2019
1 FAMC @F1@
1 FAMS @F2@
1 NOTE Breast cancer diagnosed age 47.

0 @F2@ FAM
1 HUSB @I2@
1 WIFE @I1@
1 CHIL @I3@
1 CHIL @I4@

The tree structure is conveyed by FAM records that link a husband (HUSB), wife (WIFE), and children (CHIL). Individuals point back at the families they are a child of (FAMC) and families they are a spouse in (FAMS). This is exactly what a pedigree needs.

What carries over cleanly to a clinical pedigree

These GEDCOM elements map directly to Evagene's pedigree data model with no loss:

  • Individuals — every INDI record becomes an individual in the pedigree, with stable identifiers preserved where possible.
  • Sex — the SEX tag (M, F, U, or other values depending on the exporter's support for intersex and non-binary coding).
  • Names — given and family names from NAME records.
  • Birth and death — BIRT and DEAT events with their DATE subtags.
  • Deceased status — inferred from presence of DEAT or an explicit DEAT flag.
  • Family structure — parent-child relationships via FAMC, spouse relationships via FAMS / HUSB / WIFE, siblings by shared FAMC.
  • Free-text notes — NOTE records attached to individuals, which often contain clinically relevant free text imported from genealogy applications.

For many pedigrees, this is already enough to reconstruct the family graph faithfully and start clinical annotation on top.

What needs re-annotation after import

GEDCOM 5.5.1 has no standard clinical tags. These items are part of a clinical pedigree but are not reliably carried by GEDCOM:

  • Structured disease annotations. GEDCOM has no ICD-10 or OMIM tag. Diseases usually arrive as free text in NOTE records ("breast cancer diagnosed age 47"). After import, use Evagene's structured annotation with its 200+ disease catalogue to attach ICD-10 / OMIM codes that risk models and AI interpretation can reason about.
  • Affected status / proband flag. Which individual is the proband, and which are affected for which condition, is not a GEDCOM concept. Mark these in Evagene after import.
  • Consanguinity loops. GEDCOM represents consanguinity implicitly through shared ancestors; Evagene's explicit consanguinity loop annotation is created on review.
  • Monozygotic twin status. Some GEDCOM exporters emit twin flags under vendor-specific tags, but there is no standard. Annotate in Evagene.
  • Risk analysis outputs. BRCAPRO, MMRpro, PancPRO, Mendelian — these are clinical computations on the pedigree, not pedigree data. Run them after import.
  • AI interpretation. Generated on demand via Evagene's Analysis Templates.

What gets dropped

A lot of what is in a typical GEDCOM is genealogy-specific and has no place in a clinical record. Evagene's import drops these either entirely or retains them as low-priority notes:

  • Place names (PLAC) for birth, death, and residence.
  • Occupation (OCCU) and education.
  • Religious affiliation and ordinances (BAPM, BURI, etc.).
  • Source citations (SOUR records and cross-references).
  • Media references (OBJE records, photos, external files).
  • Custom vendor tags (tags starting with underscore, e.g. _UID, _FSFTID).

A clinical pedigree is not a family tree. It is a smaller, more focused artefact. Dropping genealogy fields is the right default; retaining them as notes is the option when lineage-tracing information needs to travel with the record.

Import workflow in Evagene

Evagene supports GEDCOM import both interactively and via the REST API.

Interactive

  1. From the dashboard, create a new pedigree.
  2. Drop the .ged file into the import dialog, or click to select.
  3. The importer parses the file, creates individuals and relationships, and presents a summary: number of individuals, number of families, unrecognised tags, any date-parsing warnings.
  4. Review the import, mark the proband, add structured disease annotations, run risk models.

Programmatic (via REST API)

POST /api/pedigrees/import  HTTP/1.1
X-API-Key: evg_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: multipart/form-data; boundary=...

--...
Content-Disposition: form-data; name="format"

gedcom
--...
Content-Disposition: form-data; name="file"; filename="family.ged"
Content-Type: text/plain

0 HEAD
1 SOUR FamilyTreeApp
... rest of GEDCOM ...
--...--

For large files the API returns an import_id and fires an import.completed webhook when parsing finishes. Batched bulk imports (for research cohort loading) use the same endpoint in a loop, governed by rate limits. See the pedigree REST API for endpoint detail.

Round-trip export

Exporting a pedigree as GEDCOM produces a 5.5.1 file that any GEDCOM-compatible tool can read. Evagene's export preserves:

  • Individuals, sex, names, birth and death dates, deceased status.
  • Family structure (FAM records with HUSB/WIFE/CHIL).
  • Free-text notes on individuals and families.
  • Structured disease annotations, encoded as notes in a documented convention so a round-trip through another GEDCOM tool and back into Evagene preserves them where the other tool does not strip notes.

What is not round-tripped:

  • Risk analysis outputs (BRCAPRO, MMRpro, PancPRO, Mendelian) — these are pedigree-derived computations, not pedigree data.
  • AI interpretation output — retained in Evagene, not emitted in GEDCOM.
  • Evagene-specific UI layout metadata.

For a lossless round trip between two Evagene instances, use JSON export rather than GEDCOM. JSON preserves the full Evagene data model; GEDCOM preserves what GEDCOM natively supports.

How this works in Evagene

The import pipeline is straightforward: parse the file with a GEDCOM 5.5.1 parser that is tolerant of common vendor deviations, normalise dates (GEDCOM permits several date formats), resolve family references into relationship edges, and emit an Evagene pedigree. Unknown tags are collected into a warnings payload the UI or API surfaces for review.

GEDCOM is one of five import formats Evagene supports. The others are JSON (Evagene's native format), 23andMe raw data (genotype / traits / health history), XEG (legacy), and pedigree images with OCR. Regardless of source, once a pedigree is in Evagene, the same downstream capabilities apply: BayesMendel risk models, Mendelian inheritance analysis, AI interpretation via Analysis Templates, embeddable viewers, REST API, webhooks, and the MCP server.

Frequently asked questions

What is GEDCOM?

A text-based, line-oriented genealogy file format. 5.5.1 is the most interoperable version and the de facto pedigree-exchange standard.

What carries over?

Individuals, family structure, sex, names, dates of birth and death, deceased status, and free-text notes. Clinical annotations encoded as notes are preserved where possible.

What gets lost?

Place names, occupations, baptisms, source citations, media references — not clinically relevant. Optionally retained as low-priority notes.

Is round-trip export supported?

Yes. Structural data and compatible notes round-trip cleanly. Risk outputs and AI interpretation are Evagene-side and do not appear in GEDCOM.

How do I import a GEDCOM?

Drag and drop in the UI, or POST to /api/pedigrees/import. Large imports return an import_id and emit an import.completed webhook.

Can GEDCOM carry disease annotations?

Not in a standard tag. They usually come in as free text and get re-annotated structurally in Evagene against ICD-10 / OMIM.

Related reading

Evaluate Evagene for your service

Join the Alpha waiting list. No credit card, no enterprise sales cycle — free access during Alpha for clinicians and research teams.

Join the Alpha Waiting List