Systems overview
Dataflow
- system set-up:
- create governance crate & data space repository (the one creating the site = emo-bon.github.io)
- manual entries in governance-crate listing GH sheets by id and observatories
- create repositories for various profiles: observ, sequence & analysis-results
- create repositories for sequence, analysis-results
- how do we set up observatory-crates
–> github action driven by govenance crate linking to gsheets
- manual follow up, new crates should get listed in the dat space repository
-
how do we produce observatory crates –> github action: - fetch csv content from gsheets - apply QC - report on needed actions - generate triples using templates from observatory-profile - produce ro-crate - publish to gh-pages
- how do we register sequences into sequence crate
- ?? todo & follow up - code, automation to check
- how do sequences end up at ENA
- ?? todo & describe agreements with GENOSCOPE
- how do we get analysis results crate
- in future: get locations and availability of samples & their sequences through the system (currently: manual knowledge available to Cymon)
- todo: describe metagoflow process + post processing
- triplisation of results executed by dockerized semantic uplifting provided in (todo: ref to *docker repo)
- results added to GH repository analysis-results-XX-crate + large files S3 via dvc
- how to get metadata into catalogue (e.g. FAIR EASE IDDAS)
- todo: execution of dockerized process (ref to docker repo)
- harvest links to datasets from data.emobon.embrc.eu/
- apply sema harvest tricks to assemble linked triples (minimally ro-crate-metadata.json)
- export harvest result into dump file for import in asset catalogue
- todo: execution of dockerized process (ref to docker repo)
- how to build-up triple store
- todo: execution of dockerized stack (ref to docker repo) that:
- harvest links to datasets from data.emobon.embrc.eu/
- apply extensive harvest tricks to assemble ALL linked triples (including data turtle inside ro-crates)
- expose the triple store / SPARQL-endpoint at public URL (e.g. sparql.- or api.emobon.embrc.eu)
- todo: execution of dockerized stack (ref to docker repo) that:
- how to consume data & api
- todo: UDAL named queries
- todo: VREs consuming those via sema query + udal
Components Interaction
Description fo components-interactions