Knowledge Graph Building (Semantic Uplifting) - EMO-BON Data Workflow Handbook

This section describes how EMO-BON data is transformed into linked data and integrated into a knowledge graph.

Semantic Uplifting Process¶

RO-Crate SemBench Setup¶

The rocrate-sembench-setup action makes preparations for semantic uplifting by:

Initializing an RO-Crate from a default profile if necessary
Assembling required files and variables into the ~sembench_data_cache folder (files coming from the observatory-profile)
Creating the ~sembench_kwargs.json file with configuration parameters

These steps separate RO-Crate-specific logic from pysembench logic on a conceptual level. The utility files produced by this action are untracked via the .gitignore.

Semantify Action¶

The semantify action performs:

Generate TTL (Turtle format RDF) using pysubyt task
Validate TTL using pyshacl task
Generate LDES feed for linked data event streams
Create list of generated items for reuse by rocrate-validate

Generating TTL

CSV file(s) + Template \longrightarrow TTL file(s)

(1)

SAMPLING DATA (see data-sources)
- CSV file
  = CSV file(s) of the Quality controlled data;
  available at observatory-{obs_id}-crate/transformed/...
- Template
  = jinja template file, with required input files specified at the top;
  available at observatory-profile
- TTL file(s)
  = RDF representation of input data in TTL, input data translated into data graph entities;
  available at observatory-{obs_id}-crate/{env_package}/*
- Concrete Examples of Entities:
  - Observatory
    BPNS-sediment-observatory.csv + Template sediment_observatory → BPNS observatory.ttl
  - Sampling
    BPNS-water-sampling.csv & BPNS-water-observatory.csv + Template water_sampling → ttl files in BPNS-water-sampling-event
  - Sample
    BPNS-sediment-sampling.csv & BPNS-sediment-observatory.csv + Template water_sample → ttl files of BPNS-sediment-samples
  - Observation
    BPNS-water-measured.csv & BPNS-water-observatory.csv & logsheet_schema_extended.csv + Template water_measured → ttl files of BPNS-water-observations
ANALYSIS DATA
- Input file(s)
  = FASTA files of the various functional annotation data,
  available in ./analysis-results-{clusterID}-crate/{source_mat_id}-ro-crate/...
- Template
  = jinja template file, with required input files specified at the top
  available at analysis-results-profile
- TTL file(s)
  = RDF representation of input data in TTL, input data translated into data graph entities;
  available at ./analysis-results-{clusterID}-crate/{source_mat_id}-ro-crate/...
- Concrete Examples of Entities:
  - Taxonomic annotation
    taxinfo LSU + Template taxon-info → taxonomy-summary-LSU.ttl
    taxinfo SSU + Template taxon-info → taxonomy-summary-SSU.ttl
  - Functional annotation
    GO annotations + GO_slim annotations + IPS annotations + KO annotations + PFAM annotations + Template functional-annotation → functional annotation.ttl

RO-Crate Validation¶

The rocrate-validate process:

Validates the RO-Crate structure and content
Repairs issues where possible
Reports validation results

Publishing to Pages¶

The rocrate-to-pages process:

Converts RO-Crate to HTML for GitHub Pages
Generates human-readable views of the data

Triple Store Construction¶

The EMO-BON triple store is built through a dockerized stack that:

Harvests links to datasets from data.emobon.embrc.eu/
Applies extensive harvest tricks to assemble ALL linked triples (including data turtle inside RO-Crates)
Exposes the triple store / SPARQL-endpoint at public URL (e.g., sparql.- or api.emobon.embrc.eu)

Catalogue Integration¶

Metadata is integrated into catalogues (e.g., FAIR EASE IDDAS) through:

Dockerized process execution
Harvesting links to datasets from data.emobon.embrc.eu/
Applying semantic harvest tricks to assemble linked triples (minimally ro-crate-metadata.json)
Exporting harvest result into dump file for import in asset catalogue