Skip to article frontmatterSkip to article content

This section describes how data enters the EMO-BON system.

Observatory Data Ingestion

Logsheet Downloader Action

The logsheet-downloader-action downloads spreadsheets from Google Sheets and stores them under the ./logsheets folder, with each spreadsheet tab split out into a single CSV file. The download is scheduled to occur every 6 months.

Relating Samples with Observatories

A single observatory may take multiple samples. Therefore, each observatory maintains a list of samples taken (Google Sheets), along with their unique identifier (sample id) and other relevant attributes. These spreadsheets are known as “logsheets” (see logsheets.csv).

Repository Construction

In order to manage the observatories’ data on GitHub, a repository is automatically constructed for each observatory via a GitHub action, repo-constructor-action, acting on the governance-data repository.

More specifically, this action reads the logsheets.csv file and generates a repository with these properties:

The properties are eventually stored in the newly created repo under ./config/workflow_properties.yml

Sequence Data Ingestion

Sequence data follows a specific workflow:

Registration into Sequence Crate

ENA Submission