S-ENDA Architecture

About the architecture drafts:

  • They are described using the C4 model (https://c4model.com/). C4 does not define any properties based on the directionality of the used arrows so each arrow should have a textual description to avoid disambiguity
  • They are work in progress, and many updates are expected as we dig into the details
  • They are to a high degree based on the use cases outlined in Use Case Descriptions

General Contexts

Note

This is a draft under development. We highly appreciate input and help in correcting any mistakes.

S-ENDA is part of a larger effort within the national geodata strategy (“Alt skjer et sted”), and relates to this strategy through Geonorge, which is developed and operated by the Norwegian Mapping Authority (“Kartverket”). GeoNorge, in turn, relates to the European Inspire Geoportal through the Inspire directive. In particular, S-ENDA is responsible for Action 20 of the Norwegian geodata strategy. The goal of action 20 is to establish a distributed, virtual data center for use and management of dynamic geodata. S-ENDA’s vision is that everyone, from professional users to the general public, should have easy, secure and stable access to dynamic geodata.

The vision of S-ENDA and the goal of action 20 are aligned with international guidelines, in particular the FAIR Guiding Principles for scientific data management and stewardship.

S-ENDA in a national and international context

Dynamic geodata is weather, environment and climate-related data that changes in space and time and is thus descriptive of processes in nature. Examples are weather observations, weather forecasts, pollution (environmental toxins) in water, air and sea, information on the drift of cod eggs and salmon lice, water flow in rivers, driving conditions on the roads and the distribution of sea ice. Dynamic geodata provides important constraints for many decision-making processes and activities in society.

Geonorge is the national website for map data and other location information in Norway. Here, users of map data can search and access such information. Dynamic geodata is one such information type. S-ENDA extends Geonorge by taking responsibility for harmonising the management of dynamic geodata in a consistent manner.

The below figure illustrates S-ENDA’s position in the national and international context. As illustrated, GeoNorge CSW harvesting should also make S-ENDA datasets findable by other systems.

@startuml S-ENDA in a national and international context
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Context.puml

LAYOUT_LEFT_RIGHT

'System_Ext(edp, "European Data Portal", "European data (general)")
System_Ext(inspire, "Inspire Geoportal", "European geodata")

System_Ext(datanorge, "Data Norge", "Norwegian data (general)")
System_Ext(geonorge, "GeoNorge", "Norwegian geodata")

System_Ext(other_harvester, "Other")

System_Boundary(senda, "Dynamic geodata"){
  System(sendafind, "S-ENDA")
  System(met, "MET", "Værobservasjoner, værvarsler, etc.")
  System(nina, "NINA", "Økologiske data.")
  System(nilu, "NILU", "Atmosfære og klima, f.eks. urban luftkvalitet.")
  System(niva, "NIVA", "Vann og vannmiljø, f.eks. miljøgifter i marine miljøer.")
  System(other, "OTHER", "F.eks. drift av torskeegg og lakselus eller kjøreforhold på veiene.")
}

Rel(other_harvester, sendafind, "Search and access")
Rel(geonorge, sendafind, "Harvests metadata", "CSW: ISO19115")
Rel(inspire, geonorge, "Harvests metadata")
Rel(datanorge, geonorge, "Harvests metadata")
'Rel(edp, datanorge, "Harvests metadata")
Rel(sendafind, met, "")
Rel(sendafind, nina, "")
Rel(sendafind, nilu, "")
Rel(sendafind, niva, "")
Rel(sendafind, other, "")

@enduml

S-ENDA C4 Context Diagram

The below diagram describes the S-ENDA system in the boundary for dynamic geodata above. The data consumers are defined in Users definition.

@startuml S-ENDA-C4-context-diagram
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Context.puml
   
LAYOUT_TOP_DOWN

Boundary(consumers, "Data Consumers"){
    Person(advanced, "Advanced")
    Person(intermediate, "Intermediate")
    Person(simple, "Simple")
}
    
Boundary(providers, "Providers") {
    Person(dataprovider, "Dataset Producer")
    Person(datacurator, "Data curator")
    Person(serviceprovider, "Service Provider")
}

Boundary(systems, "S-ENDA") {
    System(senda, "S-ENDA Discovery Metadata Service")
    System(productionhub, "Production", "Dataset production hubs provide new datasets.")
    System(dist_systems, "Data Distribution")
    System(monitoring, "Service Monitoring")
}

System_Ext(portals, "Portals", "External portals harvest metadata on various standards. Can also prepare data delivery (e.g., basket solution).")
System_Ext(apps, "Web/mobile apps", "External apps present data in customized ways.")

Rel(advanced, portals, "Search portals", "Web-UI/API")
Rel(intermediate, portals, "Search portals", "Web-UI/API")
Rel(simple, apps, "Navigates to app", "Web/mobile UI")

System(senda, "Discovery Metadata Service")

Rel(advanced, senda, "Search", "CSW/OpenSearch: ISO19115/DCAT/MMD")
Rel(portals, senda, "Harvest metadata", "CSW/OAI-PMH: ISO19139")
Rel(apps, senda, "Harvest metadata", "CSW/OAI-PMH: ISO19139")

Rel(providers, senda, "Register metadata")
Rel(providers, senda, "Check usage statistics and metadata consistency/status.")

System(productionhub, "Production")
System(dist_systems, "Data Distribution")

Rel(dataprovider, productionhub, "Sets up data production")
Rel(productionhub, dist_systems, "Store")
Rel(senda, productionhub, "Listen in order to get last updated info, and harvest metrics")
Rel(senda, dist_systems, "Get metadata and metrics")
Rel(apps, dist_systems, "Stream data")
Rel(portals, dist_systems, "Stream data")
Rel(advanced, dist_systems, "Stream data")

'Rel(monitoring, dist_systems, "")
'Rel(monitoring, senda, "")

@enduml

S-ENDA Discovery Metadata Service - C4 container diagram

@startuml S-ENDA-metadata-service-container-diagram
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Component.puml

LAYOUT_TOP_DOWN

Boundary(providers, "Providers") {
    Person(dataprovider, "Dataset Producer")
}

Boundary(systems, "S-ENDA") {
  System(productionhub, "Production")
  System(dist_systems, "Data Distribution")
  System(monitoring, "Service Monitoring")
  System_Boundary(mserviceSystem, "Discovery Metadata Service") {

    Container(updater, "Discovery Metadata Catalog Ingestor API", "Python/Flask", "Provides functionality to  create content in metadata stores.")
    Container(metadata_store, "Backup Dataset Discovery Metadata Store", "Git", "MMD discovery and configuration metadata for datasets.")
    Container(csapi, "Dataset Catalog Service API", "pycsw", "CSW endpoint for search and harvesting. Serves INSPIRE, DIF etc. compliant metadata.")
    Container(solr, "Metadata Storage", "solr/or similar", "Metadata storage for Drupal websites.")
    Container(mms, "Messaging", "MMS", "<b>Optional</b> message production.")
    Container(web_app, "Web Application", "HTML", "<b>Optional.</b> Provides functionality to register dataset and service metadata, display dataset and service usage statistics, production status, and monitor metadata to display errors and warnings (e.g., about dead links).")

    Rel(updater, csapi, "Create/Update/Delete", "HTTP POST: INSPIRE")
    Rel(updater, metadata_store, "Create/Update/Delete", "git: MMD")
    Rel(updater, solr, "Create/Update/Delete")
    Rel(updater, mms, "Create")
  }
}


Boundary(consumers, "Data Consumers"){
  Person(advanced, "Advanced")
  Person(intermediate, "Intermediate")
  Person(simple, "Simple")
}

System_Ext(portals, "Portals", "External portals harvest metadata on various standards")
System_Ext(apps, "Web/mobile apps", "External apps present data in customized ways.")

Rel(dataprovider, productionhub, "Sets up data production")

Rel(simple, portals, "Search portals", "Web-UI/API")
Rel(intermediate, portals, "Search portals", "Web-UI/API")
Rel(advanced, portals, "Search portals", "Web-UI/API")

Rel(apps, csapi, "Search", "CSW")
Rel(apps, dist_systems, "Stream data")
Rel(simple, apps, "Navigates to app", "Web/mobile UI")

Rel(portals, csapi, "Harvest", "CSW/OAI-PMH")
Rel(advanced, csapi, "Search", "CSW/OpenSearch")
Rel(productionhub, dist_systems, "Store", "ACDD compliant netCDF-CF files")
Rel(productionhub, updater, "Create/Update/Delete", "HTTP POST: MMD")
Rel(dataprovider, web_app, "Check dataset statistics and metadata consistency/status.")

@enduml

Note

  • File-level metadata is editable only via ACDD compliant NetCDF-CF files. Higher level datasets (i.e., collections and series) are added via the CLI Registrar or the Web Application, and stored in their own catalogue (IS THIS NECESSARY?). The file-level metadata can contain parent-child relationships to the higher level datasets (series/collections). The Dynamic Geo-Assets API in this version is essentialy replaced by a set of tools assisting in creation of metadata in ACDD.
  • api.met.no and similar APIs that serve merged data, point to the source datasets in the Service Discovery Metadata
  • APIs that serve single datasets (e.g., Frost, after it has been decided what is a dataset, collection and series) needs to be better displayed here (at the moment we store netcdf-cf files from Frost but this is not the intention for the long term)

Dataset catalog service API - C4 component diagram

Production Hubs - C4 container diagram

@startuml S-ENDA-production-container-diagram
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Component.puml

LAYOUT_TOP_DOWN

Boundary(providers, "Providers") {
    Person(dataprovider, "Dataset Producer")
}

Boundary(systems, "S-ENDA") {
  System_Boundary(productionhub, "Production"){
    Container(listen_update, "Production listener", "", "Listens for metadata updates, updates metdata in datafiles on behalf of authenticated user.")
    Container(listen_delete, "Production listener", "", "Listens for metadata, and deletes files on behalf of authenticated user.")
    Container(listen_start, "Production listener", "", "Listens for metadata, and starts new processing on behalf of authenticated user.")
    Container(job, "Job", "SMS/PPI", "Production Script")

    Container(mmd, "py-mmd-tools", "MMD production")
    Container(mms, "MMS", "Message production")

    Rel(listen_update, mms, "")
    Rel(listen_delete, mms, "")
    Rel(listen_start, mms, "")
    Rel(listen_start, job, "Start job")
  }
  System(monitoring, "Service Monitoring")
  System(mserviceSystem, "Discovery Metadata Service")
  System(datadist, "Data Distribution")
}

Boundary(consumers, "Data Consumers"){
  Person(advanced, "Advanced")
  Person(intermediate, "Intermediate")
  Person(simple, "Simple")
}

System_Ext(portals, "Portals", "External portals harvest metadata on various standards")
System_Ext(apps, "Web/mobile apps", "External apps present data in customized ways.")

Rel(dataprovider, job, "Sets up production of netCDF-CF files with ACDD metadata")

Rel(simple, portals, "Search portals", "Web-UI/API")
Rel(intermediate, portals, "Search portals", "Web-UI/API")
Rel(advanced, portals, "Search portals", "Web-UI/API")

Rel(apps, mserviceSystem, "Search", "CSW")
Rel(apps, datadist, "Stream", "WMS/WMTS/etc.")
Rel(simple, apps, "Navigates to app", "Web/mobile UI")

Rel(portals, mserviceSystem, "Harvest", "CSW/OAI-PMH")
Rel(advanced, mserviceSystem, "Search", "CSW/OpenSearch")
Rel(job, datadist, "Store")
Rel(job, mserviceSystem, "Create/Update/Delete", "HTTP POST: MMD")
Rel(dataprovider, mserviceSystem, "Check dataset statistics and metadata consistency/status.")

Rel(portals, datadist, "Stream", "OPeNDAP/WMS/WMTS/etc.")
Rel(advanced, datadist, "Stream", "OPeNDAP")

Rel(job, mmd, "Generate MMD file")
Rel(job, mserviceSystem,"Register MMD file","HTTPS")
Rel(job, mms,"Register MMS event","HTTPS")

@enduml

Distribution Systems - C4 container diagram

@startuml S-ENDA-data-distribution-container-diagram
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Component.puml

LAYOUT_TOP_DOWN

Boundary(providers, "Providers") {
    Person(dataprovider, "Dataset Producer")
}

Boundary(systems, "S-ENDA") {
  System(productionhub, "Production")
  System(monitoring, "Service Monitoring")
  System(mserviceSystem, "Discovery Metadata Service")
  System_Boundary(datadist, "Data Distribution") {
    Container(visualization, "Visualization Service", "OGC WMS", "OGC WMS service for visualizing raster datasets (supported by mapserver).")
    Container(access, "Data Access Service", "OPeNDAP", "OPeNDAP service supported by, e.g., thredds or hyrax, to provide actual data access.")
  }
}


Boundary(consumers, "Data Consumers"){
  Person(advanced, "Advanced")
  Person(intermediate, "Intermediate")
  Person(simple, "Simple")
}

System_Ext(portals, "Portals", "External portals harvest metadata on various standards")
System_Ext(apps, "Web/mobile apps", "External apps present data in customized ways.")

Rel(dataprovider, productionhub, "Sets up production of netCDF-CF files with ACDD metadata")

Rel(simple, portals, "Search portals", "Web-UI/API")
Rel(intermediate, portals, "Search portals", "Web-UI/API")
Rel(advanced, portals, "Search portals", "Web-UI/API")

Rel(apps, mserviceSystem, "Search", "CSW")
Rel(apps, visualization, "Stream", "WMS/WMTS/etc.")
Rel(simple, apps, "Navigates to app", "Web/mobile UI")

Rel(portals, mserviceSystem, "Harvest", "CSW/OAI-PMH")
Rel(advanced, mserviceSystem, "Search", "CSW/OpenSearch")
Rel(productionhub, datadist, "Store")
Rel(productionhub, mserviceSystem, "Create/Update/Delete", "HTTP POST: MMD")
Rel(dataprovider, mserviceSystem, "Check dataset statistics and metadata consistency/status.")

Rel(portals, access, "Stream", "OPeNDAP/WMS/WMTS/etc.")
Rel(advanced, access, "Stream", "OPeNDAP")

@enduml

S3/Zarr - C4 component diagram

Note

This is part of a distribution system. The diagram below should be updated to reflect the distribution system container in the context diagram above.

We categorize data consumers in three levels:

  • advanced-consumers
  • intermediate-consumers
  • simple-consumers

The system described here is concerned with advanced-consumers and data-producers.

Functional requirements
  • data-producers should be able to produce a dataset and upload results to the data access service without time-consuming transformations
  • advanced-consumers must be able to download a copy of the entire dataset
  • advanced-consumers must be able to stream and filter parts of the dataset
  • advanced-consumers need access to enough use metadata to be able to locally post-process, reproject, etc., the dataset
  • The data access service must support the FAIR principles, in particular (meta)data interoperability and reusability
  • The transport mechanism used in the data access service needs to be a widely adopted standard solution, and it must be open-source
  • The dataset needs to be on a widely adopted open data format standard
  • The data access service as a whole needs to be easy to use with familiar tools from the meteorological/climate/oceanographic domain, both for upload and download
  • The data access service must work together with an event-driven production system
  • The data acccess service should support the Harmonised Data API from the European Weather Cloud
  • advanced-consumers need to able to give feedback on the data access service and each individual dataset.
Quality attributes
  • The total throughput and storage size for the data access service need to scale with massively increasing dataset sizes
  • The total throughput, storage size and number of objects of the data access service need to scale with massively increasing number of datasets
  • Scaling for increasing size of datasets and increasing number of datasets must not significantly increase latency or decrease throughput for individual requests
  • Response time for first-byte in a response should be low, e.g < 50ms
  • The relationship between response time and size of data requested should be predictable, and not worse than a linear increase in response time with data size
Constraints
  • The data access service has no search mechanism for datasets, and assumes that the datasets can be listed/found/searched through a separate metadata catalog
S3/Zarr - C4 component diagram

@startuml Data-access-context-diagram
!includeurl https://raw.githubusercontent.com/RicardoNiepel/C4-PlantUML/release/1-0/C4_Context.puml

LAYOUT_LEFT_RIGHT

System_Boundary(MET, "Met Norway"){
    System(dataaccess, "Data access")
    System_Ext(PPI, "PPI")
    System_Ext(HPC, "HPC")

    Rel(PPI, dataaccess, "Push data", "S3/Zarr")
    Rel(HPC, dataaccess, "Push data", "S3/Zarr")

    Rel(dataaccess, PPI, "Access data", "CIFS")
    Rel(PPI, dataaccess, "Push data", "S3/Zarr")
}

Person(researcher, "Researcher")
System_Ext(jupyter, "Jupyter lab")
System_Ext(web_maps, "Web page with climate maps")

System_Ext(prodgen, "Product-generation system")

Rel(researcher, jupyter, "Interactive data mining", "Web GUI")

Rel(jupyter, dataaccess, "Access data", "S3/Zarr")
Rel(prodgen, dataaccess, "Access data", "OpenDAP")
Rel(web_maps, dataaccess, "Web page with climate maps", "OGC API")

@enduml