GA4GH Phenopackets and pedigree data: Phenopacket v2, HPO, and how Evagene fits

A technical guide for rare disease teams on GA4GH Phenopackets v2, how Phenopackets represent pedigrees through the Family schema, comparison with FHIR FamilyMemberHistory, and how Evagene's current disease-centric model relates to Phenopacket interoperability.

| 13 min read

Short version. GA4GH Phenopackets v2 is the current version of a genomics-research-oriented standard for capturing deep phenotypic and genomic information about an individual, with HPO as the core phenotype vocabulary. A Phenopacket Family bundles a proband Phenopacket, relative Phenopackets, and a PED-style Pedigree structure — together a machine-readable rare-disease record suitable for Matchmaker Exchange and similar networks. Evagene does not currently ship native Phenopacket v2 export; its clinical model is disease-centric via ICD-10 / OMIM rather than HPO-first. JSON export from Evagene carries enough structural pedigree and coded disease data that an adapter can produce a Phenopacket Family; services doing HPO-first deep phenotyping still need to source HPO terms separately. PedigreeTool supports Phenopackets v2 natively and is a closer fit for those workflows today. Phenopacket support is on the Evagene roadmap.

This page is for rare disease teams who need to place Evagene in a Phenopacket-oriented pipeline, and for platform evaluators comparing approaches across tools.

What Phenopackets are

Phenopackets is a standard published by the Global Alliance for Genomics and Health (GA4GH) for representing phenotypic and genomic information about a subject in a structured, machine-processable way. The v2 schema is the current published version, defined as Protocol Buffers with canonical JSON encoding. Its core building blocks:

  • Subject — demographics and status.
  • PhenotypicFeature — observed phenotypes, coded against the Human Phenotype Ontology (HPO) with optional modifiers such as onset, severity, evidence, and excluded status.
  • Measurement — quantitative clinical measurements.
  • Biosample — tissue or fluid samples, with procedure and histology.
  • Disease — diagnosed conditions, coded against OMIM, Mondo, or similar.
  • Interpretation — the diagnostic interpretation, typically linking variants to the disease.
  • MetaData — authoring resource, created-by, ontologies used and their versions.

A single Phenopacket represents one individual. For family-level information, the schema defines Family: a proband Phenopacket, a list of relative Phenopackets, and a Pedigree structure that describes the relationships.

The Phenopacket Family schema and pedigree representation

The Pedigree inside a Family uses the long-familiar PED representation: one row per person, with paternal and maternal IDs, sex, and affected status. It is deliberately simple; the rich phenotypic content lives in the per-individual Phenopackets bundled alongside.

{
  "id": "family-smith-001",
  "proband": { /* Phenopacket for proband */ },
  "relatives": [ /* Phenopacket per relative */ ],
  "pedigree": {
    "persons": [
      { "family_id": "family-smith-001",
        "individual_id": "proband",
        "paternal_id": "father",
        "maternal_id": "mother",
        "sex": "FEMALE",
        "affected_status": "AFFECTED" },
      { "family_id": "family-smith-001",
        "individual_id": "mother",
        "paternal_id": "0",
        "maternal_id": "0",
        "sex": "FEMALE",
        "affected_status": "UNAFFECTED" }
      /* ... */
    ]
  },
  "meta_data": { /* ontology versions */ }
}

What is special here: the family's structural graph is compact and unambiguous (PED), while the clinical depth sits in HPO-coded phenotypes on each individual Phenopacket. This is exactly what undiagnosed-rare-disease workflows want — enough structure to run matchmaking and gene prioritisation pipelines, enough phenotype depth to make those pipelines useful.

Why Phenopackets matter for rare disease

Rare disease diagnosis is frequently a cross-institutional activity. A patient's combination of phenotypes may match only a handful of cases worldwide; finding those matches requires a shared vocabulary and a shared data shape. Phenopackets is the de facto data shape for this work:

  • Matchmaker Exchange and similar networks consume Phenopacket-like payloads for patient-matching.
  • Research submission to programmes such as the Genomics England research environment, NIH Undiagnosed Diseases Network, and similar is often Phenopacket-aligned.
  • Gene-prioritisation pipelines (Exomiser, LIRICAL, and others) consume HPO-coded phenotype lists; a Phenopacket is a natural input.
  • Cross-platform interoperability. A Phenopacket leaves one system and arrives at another with its meaning largely intact, given the declared ontology versions in meta_data.

For services whose core business is undiagnosed rare disease, HPO-first phenotype capture and Phenopacket-oriented interchange is often the default workflow.

Phenopacket vs FHIR FamilyMemberHistory

Both formats touch pedigree data but from different angles:

  • Audience. FHIR is for healthcare IT interoperability inside hospital estates; Phenopackets is for genomics research and rare-disease diagnostic workflows.
  • Phenotype depth. FamilyMemberHistory lists conditions with optional onset; Phenopacket PhenotypicFeature is HPO-coded with onset, severity, evidence, and excluded-status modifiers.
  • Variant and interpretation. Phenopacket has first-class Interpretation and Variant support; FHIR does too, but through the Genomics IG and additional resources.
  • Family structure. FHIR uses one FamilyMemberHistory per relative; Phenopacket Family bundles full Phenopackets and a PED-style pedigree.
  • Production pattern. A rare-disease service may emit both: FHIR for the EHR, Phenopacket for matchmaking and research submission.

They are complementary, not competing. See HL7 FHIR and pedigree data for the FHIR side of the story.

How Evagene fits today

Evagene's current clinical model is disease-centric. The 200+ disease catalogue is coded in ICD-10 and OMIM, individuals are annotated with diseases (with onset and affected status), and risk models and AI interpretation run on that structure. HPO deep phenotyping is not the primary capture mode.

This has two consequences for Phenopacket interoperability:

  1. The Pedigree section of a Phenopacket Family is straightforward to generate from an Evagene pedigree — the sex, affected status, and parental relationships are all available and map cleanly.
  2. The per-individual PhenotypicFeature lists are not generated by Evagene natively today. If your workflow produces HPO terms (in a separate tool, in a spreadsheet, in the EHR), the adapter building the Phenopacket Family needs to merge them in from that source.

Services that do use Evagene in a rare-disease context typically do so for the pedigree management, cancer risk, Mendelian inheritance, and AI interpretation layers — and pair it with a Phenopacket-oriented HPO capture tool for the phenotype depth. For services whose primary workflow is HPO-first rare-disease diagnostics with Phenopacket interchange, dedicated tools like PedigreeTool are a closer fit today.

Building a Phenopacket Family from Evagene

The adapter shape (not a copy-paste implementation):

1. Use Evagene REST API or JSON export to read the pedigree,
   individuals, sex, affected status, and disease annotations.

2. For each individual, create a Phenopacket:
   - Subject: id, sex (map Evagene's sex to HL7 v3 gender), dob,
     deceased status.
   - Disease: map ICD-10 and OMIM codes to Phenopacket Disease
     entries with appropriate ontology references.
   - PhenotypicFeature: populate from your HPO source, if any.
   - MetaData: record ontology versions and source resources.

3. Build a Pedigree:
   - For each individual emit a Person with family_id,
     individual_id, paternal_id ("0" if absent), maternal_id,
     sex, and affected_status.

4. Assemble Family:
   - proband = Phenopacket for the proband.
   - relatives = list of other Phenopackets.
   - pedigree = the Pedigree from step 3.
   - meta_data at the Family level too.

Adapters of this shape are typically a few hundred lines of Python using the official Phenopackets SDK. They sit alongside your Evagene integration rather than inside it.

How this works in Evagene

The Evagene platform surfaces relevant to Phenopacket workflows:

  • JSON export — the complete pedigree structure and annotations in Evagene's native schema, suitable for adapter consumption.
  • REST API — programmatic access for pipelines that build Phenopackets on demand from the current Evagene state.
  • Webhookspedigree.updated, individual.updated, and analysis.completed let the adapter refresh its generated Phenopacket when the pedigree changes.
  • GEDCOM export — a useful secondary interchange if your Phenopacket pipeline needs to consume GEDCOM for structural data and layer HPO on top.

What is on the roadmap: native Phenopacket v2 export, optional HPO annotation alongside the existing ICD-10 / OMIM disease model, and closer integration with rare-disease gene-prioritisation pipelines. If you are building on Phenopackets today and Evagene fits the pedigree and risk layer, please flag your use case via the Alpha waiting list so we can prioritise accordingly.

Frequently asked questions

What is a Phenopacket?

A GA4GH standard data structure for subject-level phenotype and genomic data. v2 is current; Family bundles a proband and relatives with a PED-style pedigree.

How does Family represent a pedigree?

PED-style structural rows plus per-individual Phenopackets carrying phenotypes and diagnoses.

Does Evagene support Phenopackets v2 natively?

Not today. JSON export plus an adapter is the current path; native export is on the roadmap.

How does Evagene compare to PedigreeTool?

PedigreeTool supports Phenopacket v2 natively and is a closer fit for HPO-first rare-disease workflows. Evagene is stronger on pedigree-centric clinical workflow, BayesMendel risk, AI interpretation, and platform integration.

Phenopacket vs FHIR FamilyMemberHistory?

Complementary: FHIR for EHR interoperability, Phenopacket for genomics research and matchmaking.

Can I build a Phenopacket from Evagene today?

Yes — a small adapter reads Evagene's JSON export or REST API and emits a Phenopacket Family, merging HPO terms from your phenotype source.

Related reading

Evaluate Evagene for your service

Join the Alpha waiting list. No credit card, no enterprise sales cycle — free access during Alpha for clinicians and research teams.

Join the Alpha Waiting List