openEngiadina wiki

Encoding for Robust Immutable Storage

ERIS is a scheme for secure immutable storage.

See the published write-up for a detailed description of the scheme.

There is also a JavaScript demo illustrating how ERIS works.

Requirements

Availability

Content can be replicated, cached and authenticated.

Plausible Deniablility

Servers and other intermediaries should be able to not know the content of the data.

Use "boring" crypto

We do not trust ourselves to rolling our own crypto and we do not want crypto agility.

Use higher-level API as defined by NaCl.

Provide URIs

The stored data can be referenced with a standard URI.

This allows compatibility with existing Web and in particular with RDF

Non-requirements

Mutability

Can be implemented on-top of a immutable storage.

Background

Reliability with erasure codes

Tahoe-LAFS uses erasure codes for more increased redundancy. Is this something that is required? Can this be transparently implemented at a lower-level?

Convergent Encryption

Data is encrypted with a deterministically derived key from the hash of the content.

Now when you store the same data twice it gets the same identifier but is still encrypted.

Two known attacks on Convergent Encryption

  1. The Confirmation Of A File Attack

    If the attacker knows the data being stored, then they can check if the data is already being stored (as the id of encrypted content is known to anybody who has the data).

  2. The Learn-The-Remaining-Information

    If data is known except for a small part (entropy is low) it can be brute-forced.

    This would be impossible when using a secret key for encrypting the data which is not the hash of the data as the secret key would introduce enough entropy.

Solutions to both is to allow a secret value to be specified that is mixed with the encryption key. If the same secret value is used for the same data it is still gets the same id. This makes sense for users. Users can de-duplicate their own data but it is never de-duplicated with other users data.

Cryptosphere does this with an optional convergence secret.

Such a secret value can be easily implemented with keyed hashing as provided by NaCl (libsodium doc).

Hash tree / Merkle tree

A hash tree (or Merkle tree) is a tree where every node is labelled with a cryptographic hash. The content of the tree (and sub-trees) can be efficiently verified.

This is a basic structure used by all the projects listed below.

Block size

How should block size be chosen?

Reasons for small block size (~1kb)

Reasons for larger block size

Garbage collection

How can it be decided that content is no longer required and space be reclaimed safely?

Tahoe-LAFS has some notes on Garbage Collection.

Projects

Do other projects do secure immutable storage? How? Why can we not reuse what they do?

Cryptosphere

Splits data into blocks (variable size) and builds two parallel hash-trees. One hash tree is accessible with "Read"-capability and holds encryption keys to data blocks.

The parallel "Repair"-tree holds hash of data blocks and can verify presence of content, but can not read content.

Cryptosphere uses NaCl. They use authenticated encryption for data blocks (using crypto_secretbox), even though content can be authenticated via hash (see discussion).

See description of data model.

GNUNet

An Encoding for Censorship-Resistant Sharing (ECRS, 2008) describes the encoding scheme as implemented in the GNUnet file-sharing application.

Primary requirements are:

The scheme is based on Efficient Sharing of Encrypted Data (2002).

Data is split up into blocks of size 32 kilo bytes (called /DBlock/s). DBlocks are encrypted with the hash of the content and identified with the hash of the encrypted content.

To retrieve a DBlock the hash of the encrypted content and the key (hash of plaintext) is required. This pair is called a content hash key (CHK). Freenet uses the same scheme.

Size of hash codes is 512-bit (64bytes). Size of a CHK is 128bytes.

Multiple DBlocks are referenced from an IBlock that also has size 32kB (contains at most 256 CHKs). Resulting IBlocks are encoded in the same way as the DBlocks until there is a single IBlock left, the top IBlock.

In order to find files two additional mechanisms are implemented. However they are not strictly necessary:

While out-of-band communication of those CHK keys is cer-tainly feasible, an integrated solution is clearly desirable.

The two mechanisms for finding files are:

Tahoe-LAFS

See File Encoding for details.

Datashards

Inspired by Tahoe-LAFS.

Freenet

An Encoding for Censorship-Resistant Sharing (ECRS, 2008) describes the Freenet scheme in contrast to ECRS.

See also

ERIS Guile implementation

Git repository of Guile implementation.

ERIS JavaScript implementation

Git repository of JavaScript implementation.

Data model and Data storage

ERIS was developed as part of a larger research task into data model and data storage.