This section describes how EMO-BON sequences are stored and accessed at ENA (European Nucleotide Archive).
What is ENA?¶
The European Nucleotide Archive (ENA) is one of the world’s primary repositories for nucleotide sequence data.
Website: https://
Operated By: European Bioinformatics Institute (EMBL-EBI)
EMO-BON and ENA¶
All EMO-BON sequences are archived at ENA for:
Long-term preservation: Guaranteed storage
Public access: Open data for research community
Standardization: ENA metadata standards
Citability: Stable accession numbers
Compliance: Meeting data management requirements
EMO-BON Project Structure at ENA¶
Umbrella Project¶
All EMO-BON data is grouped under a single umbrella project:
Accession: PRJEB51688
URL: https://
Observatory Projects¶
Each observatory has its own ENA project:
Pattern: Each observatory receives a unique project accession
Examples:
Observatory BPNS: PRJEB[NUMBER]
Observatory HCMR-1: PRJEB[NUMBER]
Managed In: governance
Data Submission Process¶
Sequence Generation¶
Sample Collection: Observatory collects samples
Sample to Genoscope: Samples sent to sequencing facility
Sequencing: Genoscope performs sequencing
Raw Data: Sequences stored on Genoscope cloud
Submission to ENA¶
General Process:
Prepare sequence files
Prepare metadata (sample information)
Submit to ENA via their API or web interface
Receive accession numbers
Link accessions to EMO-BON crates
Metadata Requirements¶
ENA requires extensive metadata:
Sample metadata: Collection location, date, depth, etc.
Experiment metadata: Library preparation, sequencing method
Run metadata: Instrument, read length, quality scores
Study metadata: Project description, publications
EMO-BON logsheets provide much of this information.
Accessing ENA Data¶
By Accession Number¶
Each sequence or sample has a unique accession:
Formats:
Study: PRJEB[NUMBER]
Sample: SAMEA[NUMBER]
Run: ERR[NUMBER]
Analysis: ERZ[NUMBER]
URL Pattern: https://www.ebi.ac.uk/ena/browser/view/{ACCESSION}
Via ENA Portal¶
Search: https://
Filters:
By project (PRJEB51688)
By taxonomy
By location
By date range
Programmatic Access¶
ENA API: REST API for querying and downloading
Example:
curl "https://www.ebi.ac.uk/ena/portal/api/search?query=study_accession%3DPRJEB51688"ENA Browser API: Direct file downloads
Integration with EMO-BON¶
Linking Sequences to Samples¶
EMO-BON maintains links between:
Observatory samples (in observatory crates)
ENA accessions (in sequencing logistics crate)
Analysis results (in analysis results crates)
Sequencing Crate¶
Repository: sequencing-logistics-crate (planned)
Purpose: Central registry linking samples to their sequences
Contents:
Sample ID to ENA accession mappings
Batch information
Sequencing run metadata
Links to raw data on ENA
Metadata Flow¶
Logsheets (Google Sheets)
↓
Observatory Crates (GitHub)
↓
Sample Metadata
↓
ENA Submission
↓
ENA Accessions
↓
Sequencing Logistics Crate (GitHub)ENA Metadata Standards¶
MIxS (Minimum Information about any Sequence)¶
ENA requires MIxS-compliant metadata.
EMO-BON Uses:
MIMARKS (environmental markers)
MIxS environmental packages
Fields Include:
Geographic location
Collection date
Environmental context
Sample processing
Darwin Core¶
EMO-BON also aligns with Darwin Core for biodiversity data:
Occurrence records
Taxonomic information
Event details
Data Updates and Corrections¶
Metadata Updates¶
If metadata needs correction:
Update in ENA via their interface
Update in EMO-BON crates
Maintain synchronization
Data Withdrawal¶
In rare cases, data may need withdrawal:
Contact ENA support
Follow their withdrawal process
Update EMO-BON records
Benefits of ENA Storage¶
For Researchers¶
Free access: No cost to download
Stable URLs: Accessions never change
Searchable: Integrated with other databases
Standardized: Common metadata format
For EMO-BON¶
Reliable: ENA ensures long-term preservation
Trusted: Recognized by research community
Compliant: Meets funder requirements
Visible: Increases data discoverability
Future Plans¶
Automation¶
Automated submission pipeline
Regular synchronization
Validation before submission
Enhanced Integration¶
Richer links between EMO-BON and ENA
Bidirectional metadata flow
Integrated search across both systems