Preprint · not peer-reviewed Researchers Educators

Preprint details how CZ CELLxGENE Discover uses collaborative curation to balance scale and data quality

A bioRxiv preprint from the CZ CELLxGENE team describes a submission model in which data contributors partner with dedicated curators, enabling the resource to grow rapidly while maintaining metadata quality for AI-scale analysis.

Published · AI-drafted summary based on 1 public source
Illustration for generic story
Illustrative image — not from the source article.
Share

A preprint posted to bioRxiv on 5 June 2026 describes the collaborative submission model underpinning CZ CELLxGENE Discover, a community data resource that aggregates single-cell and spatial transcriptomics datasets across studies for large-scale biomedical research and AI model development.

The authors, drawn from Chan Zuckerberg Initiative and collaborating institutions, outline how a fundamental tension in building community resources — the desire for a large data corpus versus the need for high-quality, richly annotated metadata — has been addressed by partnering data contributors directly with dedicated resource curators. Rather than requiring contributors to conform to standards independently, curators work alongside submitters to harmonise data and metadata, reducing errors and improving downstream usability.

The preprint reports that this model has enabled CELLxGENE Discover to become a widely used infrastructure resource, supporting large-scale re-analysis and the training of foundation models in genomics. The authors discuss practical lessons for other community resources facing the same scale-versus-quality tension.

The work has not yet been peer-reviewed. It will be of primary interest to researchers building or contributing to genomic data infrastructures, and to those developing or evaluating AI models trained on single-cell data.

Sources

Read the original reporting — these are the public sources this summary draws from.

  1. Primary sourcePreprint bioRxiv (Cold Spring Harbor Laboratory) · 2026-06-05
    A collaborative submission model for building high-quality data resources at scale through partnership

Tags

single-cell-genomics data-infrastructure cellxgene metadata-curation community-resource ai-genomics preprint
Share

About Genetic Current

Educational summaries of public genetics news

Genetic Current is the news section of Evagene, an academic, research, and educational pedigree-modelling platform. Stories are AI-drafted summaries of items from trusted public sources, written for researchers, clinicians, educators, students, genealogists, and patients with an interest in genetics. Summaries are for educational and research purposes only and are not medical advice.

Join the Evagene Alpha Waiting List