openEngiadina wiki

Data model and Data storage

As part of the NLNet project openEngiadina is conducting research into a suitable data model and data storage for the platform.

Problems

There are couple of problems with ActivityPub and the Web in general. Some of the problems we try to address:

Availability

On the web the source of truth is usually a centralized host. It needs to manage all data and all state mutation. If the host goes down, data is gone.

A subtle implication is that when creating new data that references or links to existing data on a single host, the quality and availability of the newly created data depends on the availability of the existing data. This is especially problematic when working with Linked Data that relies heavily on being able to reference remote content (see the non-technical issues mentioned in A More Decentralized Vision for Linked Data (2018)).

What makes things worse is semantic querying of data (e.g. via a SPARQL endpoint) which can be very expensive for host, reducing availability.

Authenticity of content

On the web authenticity of content is usually guaranteed on the transport-level. The connection to the server is authenticated not the content itself.

It would be much nicer to have a simple and efficient way of checking authenticity of pieces of content directly. This also allows content to be cached or forwarded by a third-party while first-party authenticity can be proved.

Secrecy

In ActivityPub there is no way of ensuring secrecy. Content is not encrypted.

With openEngiadina we prioritize public/common data. So this is less of a priority, but good to consider.

Goals

Research and develop a data model for openEngiadina that allows:

Work

The ingredients to the data model.

DONE Encoding for Robust Immutable Storage

Store stuff securely and immutably in a content-addressable storage.

We have made a concrete proposal on how to solve this with ERIS.

DONE Content-addressable RDF

Make RDF content-addressable. This solves the problem of availability.

Challenges include:

We have made a concrete proposal on how to do this (see Content-addressable RDF).

TODO Mutable data

Content-addressed storage solves the problem of availability. However it imposes immutability of data.

Mutability is important for dynamic systems. Life would be very static without any changes.

Datashards (and Tahoe-LAFS) implement mutability. However they require coordination between writers. We would like to solve this with eventual consistency and CRDTs.

TODO Multi-model database

How can content be stored and indexed allowing all requirements above as well as semantic queries?

Generic Tuple Store Database

Work done in the Scheme community on how to build custom databases with multi-faceted views of the data.

Authoritative reasoning

How to reason and infer over partially trustworthy data.

See also

P2Pcollab

Goal is to make the data model directly usable on the existing Web/Fediverse as well as in completely decentralized systems. We are collaborating with P2PCollab, who are building a completely decentralized system, on the data model.

InfoCentral

A huge source of inspiration. The project is working on a Linked Data inspired data model for decentralized systems.

We aim to be more compatible with existing RDF data, maybe at the cost of consistency and elegance.

Datashards

Provides a nice foundation for immutable content-addressed storage.

Vegvisir

Provides ideas on how to put ACLs on CRDTs.