openEngiadina wiki

Data model and Data storage

As part of the NLNet project openEngiadina is conducting research into a suitable data model and data storage for the platform.


There are couple of problems with ActivityPub and the Web in general. Some of the problems we try to address:


On the web the source of truth is usually a centralized host. It needs to manage all data and all state mutation. If the host goes down, data is gone.

A subtle implication is that when creating new data that references or links to existing data on a single host, the quality and availability of the newly created data depends on the availability of the existing data. This is especially problematic when working with Linked Data that relies heavily on being able to reference remote content (see the non-technical issues mentioned in A More Decentralized Vision for Linked Data (2018)).

What makes things worse is semantic querying of data (e.g. via a SPARQL endpoint) which can be very expensive for host, reducing availability.

Authenticity of content

On the web authenticity of content is usually guaranteed on the transport-level. The connection to the server is authenticated not the content itself.

It would be much nicer to have a simple and efficient way of checking authenticity of pieces of content directly. This also allows content to be cached or forwarded by a third-party while first-party authenticity can be proved.


In ActivityPub there is no way of ensuring secrecy. Content is not encrypted.

With openEngiadina we prioritize public/common data. So this is less of a priority, but good to consider.


Research and develop a data model for openEngiadina that allows:


The ingredients to the data model.

DONE Encoding for Robust Immutable Storage

Store stuff securely and immutably in a content-addressable storage.

We have made a concrete proposal on how to solve this with ERIS.

DONE Content-addressable RDF

Make RDF content-addressable. This solves the problem of availability.

Challenges include:

We have made a concrete proposal on how to do this (see Content-addressable RDF).

TODO Mutable data

Content-addressed storage solves the problem of availability. However it imposes immutability of data.

Mutability is important for dynamic systems. Life would be very static without any changes.

Datashards (and Tahoe-LAFS) implement mutability. However they require coordination between writers. We would like to solve this with eventual consistency and CRDTs.

TODO Multi-model database

How can content be stored and indexed allowing all requirements above as well as semantic queries?

Generic Tuple Store Database

Work done in the Scheme community on how to build custom databases with multi-faceted views of the data.

Authoritative reasoning

How to reason and infer over partially trustworthy data.

See also


Goal is to make the data model directly usable on the existing Web/Fediverse as well as in completely decentralized systems. We are collaborating with P2PCollab, who are building a completely decentralized system, on the data model.


A huge source of inspiration. The project is working on a Linked Data inspired data model for decentralized systems.

We aim to be more compatible with existing RDF data, maybe at the cost of consistency and elegance.


Provides a nice foundation for immutable content-addressed storage.


Provides ideas on how to put ACLs on CRDTs.