As part of the NLNet project openEngiadina is conducting research into a suitable data model and data storage for the platform.
There are couple of problems with ActivityPub and the Web in general. Some of the problems we try to address:
On the web the source of truth is usually a centralized host. It needs to manage all data and all state mutation. If the host goes down, data is gone.
A subtle implication is that when creating new data that references or links to existing data on a single host, the quality and availability of the newly created data depends on the availability of the existing data. This is especially problematic when working with Linked Data that relies heavily on being able to reference remote content (see the non-technical issues mentioned in A More Decentralized Vision for Linked Data (2018)).
What makes things worse is semantic querying of data (e.g. via a SPARQL endpoint) which can be very expensive for host, reducing availability.
Authenticity of content
On the web authenticity of content is usually guaranteed on the transport-level. The connection to the server is authenticated not the content itself.
It would be much nicer to have a simple and efficient way of checking authenticity of pieces of content directly. This also allows content to be cached or forwarded by a third-party while first-party authenticity can be proved.
In ActivityPub there is no way of ensuring secrecy. Content is not encrypted.
With openEngiadina we prioritize public/common data. So this is less of a priority, but good to consider.
Research and develop a data model for openEngiadina that allows:
- Availability of content regardless of host availability (with content-addressable storage)
- Verifying authenticity of content on client side
- Semantic queries
The ingredients to the data model.
DONE Encoding for Robust Immutable Storage
Store stuff securely and immutably in a content-addressable storage.
We have made a concrete proposal on how to solve this with ERIS.
DONE Content-addressable RDF
Make RDF content-addressable. This solves the problem of availability.
- Finding a subset of RDF that works with content-addressing
- Finding a suitable grouping of statements
- Serialization format that can be normalized and hashed efficiently
We have made a concrete proposal on how to do this (see Content-addressable RDF).
TODO Mutable data
Content-addressed storage solves the problem of availability. However it imposes immutability of data.
Mutability is important for dynamic systems. Life would be very static without any changes.
Datashards (and Tahoe-LAFS) implement mutability. However they require coordination between writers. We would like to solve this with eventual consistency and CRDTs.
TODO Multi-model database
How can content be stored and indexed allowing all requirements above as well as semantic queries?
Generic Tuple Store Database
Work done in the Scheme community on how to build custom databases with multi-faceted views of the data.
How to reason and infer over partially trustworthy data.
Goal is to make the data model directly usable on the existing Web/Fediverse as well as in completely decentralized systems. We are collaborating with P2PCollab, who are building a completely decentralized system, on the data model.
A huge source of inspiration. The project is working on a Linked Data inspired data model for decentralized systems.
We aim to be more compatible with existing RDF data, maybe at the cost of consistency and elegance.
Provides a nice foundation for immutable content-addressed storage.
Provides ideas on how to put ACLs on CRDTs.