Sequence Data - EMO-BON Data Workflow Handbook

Genoscope¶

Sequence data are generated by Genoscope and are hosted internally (closed access) on their infrastructure. These data consists of DNA sequence files, each identified by a unique genoscope ID which is matched to the EMO BON material sample ID.

ENA¶

The generated sequence data are regularly submitted to the European Nucleotide Archive (ENA). IN ENA we have the EMO BON umbrella project [PRJEB51688] (https://www.ebi.ac.uk/ena/browser/view/PRJEB51688), under which are the project accession numbers for each EMO BON observatory. Once the sequences are uploaded from Genoscope to ENA, they are associated with a unique ENA run accession number and related metadata (for example: ERR13955095 or ERR13954264).

In context of EMO-BON, following data types / accession numbers are relevant:

(Bio)Sample accession number (e.g. SAMEA114561122)
experiment accession number, for metagenomics sequences (e.g. ERX13356190)
run accession number, for metagenomics sequences (e.g. ERR13955095)
project accession number (e.g. PRJEB51652)
umbrella assession number (i.e. all of EMO BON) (always PRJEB51688)

Sequence Crate¶

Sequence metadata is available in the sequencing-logistics-crate repository, along with links to both Genoscope (via genoscope IDs) and ENA (via ENA accession numbers).
The repository is structured in shipment batches (for example: batch‑001), where each batch folder contains all the files pertaining to that group’s analyses – metadata tables, accession mappings, and any supplementary documentation for that batch.