This section describes how EMO-BON data is transformed into linked data and integrated into a knowledge graph.
Semantic Uplifting Process¶
RO-Crate SemBench Setup¶
The rocrate
Initializing an RO-Crate from a default profile if necessary
Assembling required files and variables into the
~sembench_data_cachefolder (files coming from the observatory-profile)Creating the
~sembench_kwargs.jsonfile with configuration parameters
These steps separate RO-Crate-specific logic from pysembench logic on a conceptual level. The utility files produced by this action are untracked via the .gitignore.
Semantify Action¶
The semantify action performs:
Generate TTL (Turtle format RDF) using pysubyt task
Validate TTL using pyshacl task
Generate LDES feed for linked data event streams
Create list of generated items for reuse by rocrate-validate
Generating TTL
SAMPLING DATA (see data-sources)
CSV file
= CSV file(s) of the Quality controlled data;
available atobservatory-{obs_id}-crate/transformed/...Template
= jinja template file, with required input files specified at the top;
available at observatory-profileTTL file(s)
= RDF representation of input data in TTL, input data translated into data graph entities;
available atobservatory-{obs_id}-crate/{env_package}/*Concrete Examples of Entities:
Observatory
BPNS-sediment -observatory .csv + Template sediment_observatory → BPNS observatory.ttl Sampling
BPNS-water -sampling .csv & BPNS -water -observatory .csv + Template water_sampling → ttl files in BPNS-water-sampling-event Sample
BPNS-sediment -sampling .csv & BPNS -sediment -observatory .csv + Template water_sample → ttl files of BPNS-sediment-samples Observation
BPNS-water -measured .csv & BPNS -water -observatory .csv & logsheet _schema _extended .csv + Template water_measured → ttl files of BPNS-water-observations
ANALYSIS DATA
Input file(s)
= FASTA files of the various functional annotation data,
available in./analysis-results-{clusterID}-crate/{source_mat_id}-ro-crate/...Template
= jinja template file, with required input files specified at the top
available at analysis-results -profile TTL file(s)
= RDF representation of input data in TTL, input data translated into data graph entities;
available at./analysis-results-{clusterID}-crate/{source_mat_id}-ro-crate/...Concrete Examples of Entities:
Taxonomic annotation
taxinfo LSU + Template taxon-info → taxonomy-summary -LSU .ttl
taxinfo SSU + Template taxon-info → taxonomy-summary -SSU .ttl Functional annotation
GO annotations + GO_slim annotations + IPS annotations + KO annotations + PFAM annotations + Template functional-annotation → functional annotation.ttl
RO-Crate Validation¶
The rocrate-validate process:
Validates the RO-Crate structure and content
Repairs issues where possible
Reports validation results
Publishing to Pages¶
The rocrate-to-pages process:
Converts RO-Crate to HTML for GitHub Pages
Generates human-readable views of the data
Triple Store Construction¶
The EMO-BON triple store is built through a dockerized stack that:
Harvests links to datasets from data
.emobon .embrc .eu/ Applies extensive harvest tricks to assemble ALL linked triples (including data turtle inside RO-Crates)
Exposes the triple store / SPARQL-endpoint at public URL (e.g., sparql.- or api.emobon.embrc.eu)
Catalogue Integration¶
Metadata is integrated into catalogues (e.g., FAIR EASE IDDAS) through:
Dockerized process execution
Harvesting links to datasets from data
.emobon .embrc .eu/ Applying semantic harvest tricks to assemble linked triples (minimally ro-crate-metadata.json)
Exporting harvest result into dump file for import in asset catalogue