23andMe pedigree import: turning consumer genotype data into clinical family-history context
What 23andMe exports, how to export it, what can be inferred from a consumer SNP array, and how Evagene brings genotype, traits, and health predispositions into a clinical pedigree — with clear limits on what the data can and cannot tell you.
Short version. 23andMe customers can export a tab-delimited raw genotype file containing several hundred thousand SNPs, plus trait and, where available, health-predisposition reports. Imported into a clinical pedigree platform, this data enriches the pedigree with SNP-inferred blood type and secretor status, trait annotations, and — where relevant — disease-related variants that can cross-reference with the disease catalogue. It is not a clinical test, and its findings do not replace accredited laboratory results, but it adds real signal to a family-history workflow. Evagene's 23andMe import pipeline is designed around that principle: infer what can be inferred, label it clearly as inferred, and let a clinician decide what to do with each flag.
What 23andMe exports
Three kinds of export are available from a standard 23andMe account.
Raw genotype data. A tab-delimited text file listing each SNP assayed on the array — identified by its rs number — alongside the customer's genotype at that position. Depending on the chip version in use when the sample was processed, the file contains between approximately 600,000 and 700,000 rows. This is the richest and most reusable of the three exports because it contains the underlying data, not any interpretation; any downstream tool can recompute its own inferences from it.
Trait reports. A set of prose and visual reports covering consumer-oriented traits: bitter taste perception, cilantro aversion, asparagus metabolite detection, earwax type, lactose tolerance, muscle composition, and several dozen others. These are calls computed by 23andMe from the raw genotype plus their proprietary reference population data.
Health predisposition and carrier-status reports. Where available by market — coverage varies by country and by a customer's service tier — 23andMe reports a selected set of disease-related variants. This includes three BRCA1/2 founder variants common in people of Ashkenazi Jewish ancestry, MUTYH-associated polyposis markers, several hereditary haemochromatosis variants, Gaucher, and others. These reports are variant-specific rather than comprehensive: the absence of a positive result for "BRCA1/2" in a 23andMe report means the three founder variants were not detected, not that BRCA1 and BRCA2 have been sequenced exhaustively.
How to export your 23andMe data
The process is straightforward and takes under five minutes of active time, with a short wait for 23andMe to prepare the file.
- Sign in to your 23andMe account in a browser.
- Open the account settings page and find the 23andMe Data section.
- Select the option to download your raw genotype data. 23andMe will confirm by email when the file is ready — typically within a few minutes, occasionally longer.
- Download the zipped tab-delimited text file when the email arrives. Keep it somewhere safe; the file is your personal genetic data.
- For trait and ancestry reports, open each report in the browser and print to PDF if you want a static copy.
- For health reports, view them within the 23andMe interface; the detailed PDF export option, where present, is reached from the individual report page.
The raw genotype file is the only export you strictly need to bring into a pedigree platform. Trait and health inferences can be recomputed from the raw file using the same underlying SNPs, and often with more flexibility than the original report allows.
What you can and cannot infer from a consumer SNP array
A consumer SNP array is not a sequencer. It assays a preselected set of variants chosen for their population frequency, medical relevance, or ancestry-informativeness. Within that set, genotype calls are usually accurate; outside that set, the array is silent. A few implications follow.
Things that can be inferred well
- Common polymorphisms. Blood type, secretor status, and many consumer traits rely on a small number of well-characterised SNPs that are reliably included on every consumer array. Evagene uses this set directly.
- Ancestry composition. Population reference panels combined with hundreds of thousands of SNPs produce robust geographic ancestry estimates, especially for populations well-represented in the reference.
- Relationship verification. Two people on 23andMe who have both tested can be matched for shared DNA; the sharing pattern confirms or refines pedigree-inferred relationships.
- Specific founder mutations. Where a variant is explicitly assayed on the array, its absence or presence can be called. This is the basis of the 23andMe BRCA1/2 founder report and of several carrier-status findings.
Things that cannot be inferred reliably
- Gene coverage. The array genotypes selected SNPs, not whole genes. Absence of a reported variant in a gene does not mean the gene has been sequenced.
- Rare mutations. Any variant not on the array is invisible. Most disease-causing mutations in most hereditary cancer or cardiac genes are not on consumer arrays.
- Structural variation. Copy-number variants, large deletions, and rearrangements are beyond the array's design.
- Polygenic risk at clinical scale. Polygenic risk scores can be computed from consumer genotype, but their clinical utility remains an active research area and their population calibration is limited for non-European ancestries.
A good pedigree platform will make these limits explicit to the user rather than presenting inferences as clinical findings. Evagene flags each SNP-inferred value in the pedigree as inferred from SNP data so a clinician reading the tree knows the provenance of every entry.
Evagene's 23andMe import pipeline
Evagene's import pipeline has four stages: parse, map, infer, and annotate.
1. Parse
The tab-delimited raw genotype file is parsed row-by-row. Each row contains an rs number (or an internal 23andMe identifier, for a small number of variants), a chromosome, a position, and the customer's genotype at that position (two alleles, concatenated). Evagene handles all commercial chip versions that 23andMe has released.
2. Map SNPs to features
Evagene maintains an internal mapping from SNPs to features — blood type, secretor status, consumer traits, and a set of catalogued disease-related variants. The mapping uses rs numbers as the primary key so the same entry works across chip versions. Where a customer's chip omits a specific SNP, that feature is marked as unavailable rather than computed from an incomplete set.
3. Infer
For each feature, Evagene computes an inference from the relevant SNP cluster:
- ABO blood type. A deletion variant
rs8176719distinguishes O from A/B; missense variantsrs8176746andrs8176747distinguish A from B. Combining genotypes yields the ABO group (A, B, AB, or O). - Rh factor.
rs590787and surrounding SNPs in the RHD gene region inform Rh positive/negative status. Rh inference is probabilistic because the underlying biology includes a gene deletion that consumer arrays assay indirectly. - Secretor status.
rs601338in FUT2: a nonsense mutation at position 428 (G428A) abolishes secretor activity. Homozygous mutation gives non-secretor, heterozygous gives secretor, and homozygous wild-type gives secretor. Secretor status is relevant to norovirus susceptibility, certain infection risks, and (in ongoing research) microbiome composition. - Consumer traits. A curated set of over 50 traits, computed from the trait's underlying SNP cluster.
- Allergies and specific disease variants. Where a variant in the disease catalogue is directly assayed, a match is flagged for clinician review.
4. Annotate
Each inferred value is attached to the individual in the pedigree and tagged as inferred from SNP data, with the source rs numbers recorded so any clinician can audit how the inference was made. Where the same field already has a clinically confirmed value — from serological blood-typing, say — Evagene surfaces a conflict for the user to resolve rather than overwriting.
Where the SNP layer adds value to a pedigree
The most obvious benefit is data enrichment. A pedigree with blood type, Rh, secretor status, and ancestry data on multiple individuals tells a richer story than one without. Blood type is routinely clinically useful; secretor status and ancestry can be relevant in specific contexts.
A less obvious benefit is the ability to cross-reference with the disease catalogue. Where the pedigree already records a phenotype — say, early-onset colorectal cancer in a first-degree relative — and 23andMe data reveals a carrier finding for MUTYH, the two data points together prompt a different clinical conversation than either alone. Evagene's AI interpretation engine can flag such combinations automatically using its analysis templates.
A third benefit is polygenic context. Several common conditions — type 2 diabetes, coronary artery disease, inflammatory bowel disease, certain cancers — have a polygenic component that a monogenic pedigree does not capture. A pedigree platform that also holds consumer genotype data can contextualise the family history with an individual polygenic signal, even if that signal remains research-grade rather than clinical-grade. Evagene does not currently report polygenic risk scores clinically, but the underlying data is imported and available for downstream analysis.
What the 23andMe data is not
It is worth restating three limits explicitly because they are often misunderstood.
It is not a clinical genetic test. A positive finding in a 23andMe report should be confirmed in an accredited laboratory before any clinical decision is made. A negative finding does not rule out a condition because most disease-causing variants are not on the array. Clinicians receiving 23andMe-derived information from a patient should treat it as a signal to consider proper testing, not as a result in itself.
It is not a full genome. Even at over 600,000 SNPs, a consumer array covers a tiny fraction of the three billion base pairs of the human genome. Most of the variation that matters clinically is not on the array.
It is not equally informative across ancestries. Consumer arrays were originally designed with European reference populations in mind, and while coverage has improved, performance on some variants in African, East Asian, South Asian, and admixed populations is weaker than in European-ancestry samples. Ancestry estimates and polygenic scores carry similar caveats.
How Evagene supports 23andMe import
Evagene accepts the raw genotype file directly. You upload the .txt file (or the zipped version), pick the individual in the pedigree to whom the data belongs, and the pipeline runs locally within the platform — the raw genotype file is not forwarded off-platform. The inferences Evagene computes include ABO blood group, Rh factor, secretor status, and the curated set of consumer traits. Where a variant from the 200+ disease catalogue is assayed on the array, it is checked and flagged for clinician review.
Each inferred value is labelled inferred from SNP data with the source rs numbers recorded on the individual. This distinguishes SNP-derived values from clinically confirmed ones and lets a clinician understand how any given entry came into the pedigree. Conflicts between SNP-inferred values and existing data are surfaced for resolution, not silently overwritten.
Beyond 23andMe, Evagene accepts GEDCOM 5.5.1, JSON, XEG, and pedigree image OCR, with the same conflict-resolution discipline applied. All imports feed the same internal pedigree model, which in turn drives the BayesMendel risk models and the Mendelian inheritance calculator.
Frequently asked questions
What does 23andMe actually export?
A tab-delimited raw genotype file of the SNPs on the consumer array (around 600,000 to 700,000 depending on chip version), plus trait and, where available, health-predisposition reports. The raw genotype file is the most useful single export for a downstream pedigree tool.
How do I export my 23andMe data?
From the 23andMe account settings, request a raw genotype download. 23andMe confirms by email when the file is ready; you download a zipped .txt file. Trait and ancestry reports can be printed to PDF from the browser.
Is 23andMe data a clinical test?
No. It is a consumer SNP array, not clinical-grade sequencing. Results usefully enrich a pedigree but are not a substitute for accredited genetic testing. Confirm any finding of clinical consequence in a certified laboratory.
What can be inferred from 23andMe raw data?
ABO blood group, Rh factor, secretor status, a wide set of consumer traits, ancestry composition, and specific founder variants that are directly assayed on the array. Most disease-causing variants in most hereditary-condition genes are not on the array.
How does Evagene handle 23andMe import?
The raw genotype file is parsed and mapped against a catalogue of relevant SNPs. Blood type, Rh, secretor status, traits, and catalogued disease variants are inferred and attached to the individual as inferred from SNP data. Conflicts with existing values are surfaced for review.
Can I import data for multiple family members?
Yes, with their consent. Multiple relatives' genotype data allows segregation observation and relationship cross-checking against the structural pedigree.