Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This section describes how EMO-BON sequences are stored and accessed at ENA (European Nucleotide Archive).

What is ENA?

The European Nucleotide Archive (ENA) is one of the world’s primary repositories for nucleotide sequence data.

Website: https://www.ebi.ac.uk/ena/

Operated By: European Bioinformatics Institute (EMBL-EBI)

EMO-BON and ENA

All EMO-BON sequences are archived at ENA for:

EMO-BON Project Structure at ENA

Umbrella Project

All EMO-BON data is grouped under a single umbrella project:

Accession: PRJEB51688

URL: https://www.ebi.ac.uk/ena/browser/view/PRJEB51688

Observatory Projects

Each observatory has its own ENA project:

Pattern: Each observatory receives a unique project accession

Examples:

Managed In: governance-data/observatories.csv

Data Submission Process

Sequence Generation

  1. Sample Collection: Observatory collects samples

  2. Sample to Genoscope: Samples sent to sequencing facility

  3. Sequencing: Genoscope performs sequencing

  4. Raw Data: Sequences stored on Genoscope cloud

Submission to ENA

General Process:

  1. Prepare sequence files

  2. Prepare metadata (sample information)

  3. Submit to ENA via their API or web interface

  4. Receive accession numbers

  5. Link accessions to EMO-BON crates

Metadata Requirements

ENA requires extensive metadata:

EMO-BON logsheets provide much of this information.

Accessing ENA Data

By Accession Number

Each sequence or sample has a unique accession:

Formats:

URL Pattern: https://www.ebi.ac.uk/ena/browser/view/{ACCESSION}

Via ENA Portal

Search: https://www.ebi.ac.uk/ena/browser/advanced-search

Filters:

Programmatic Access

ENA API: REST API for querying and downloading

Example:

curl "https://www.ebi.ac.uk/ena/portal/api/search?query=study_accession%3DPRJEB51688"

ENA Browser API: Direct file downloads

Integration with EMO-BON

Linking Sequences to Samples

EMO-BON maintains links between:

Sequencing Crate

Repository: sequencing-logistics-crate (planned)

Purpose: Central registry linking samples to their sequences

Contents:

Metadata Flow

Logsheets (Google Sheets)
    ↓
Observatory Crates (GitHub)
    ↓
Sample Metadata
    ↓
ENA Submission
    ↓
ENA Accessions
    ↓
Sequencing Logistics Crate (GitHub)

ENA Metadata Standards

MIxS (Minimum Information about any Sequence)

ENA requires MIxS-compliant metadata.

EMO-BON Uses:

Fields Include:

Darwin Core

EMO-BON also aligns with Darwin Core for biodiversity data:

Data Updates and Corrections

Metadata Updates

If metadata needs correction:

  1. Update in ENA via their interface

  2. Update in EMO-BON crates

  3. Maintain synchronization

Data Withdrawal

In rare cases, data may need withdrawal:

Benefits of ENA Storage

For Researchers

For EMO-BON

Future Plans

Automation

Enhanced Integration