ERIS is a scheme for secure immutable storage.
See the published write-up for a detailed description of the scheme.
Content can be replicated, cached and authenticated.
Servers and other intermediaries should be able to not know the content of the data.
Use "boring" crypto
We do not trust ourselves to rolling our own crypto and we do not want crypto agility.
Use higher-level API as defined by NaCl.
The stored data can be referenced with a standard URI.
This allows compatibility with existing Web and in particular with RDF
Can be implemented on-top of a immutable storage.
Reliability with erasure codes
Tahoe-LAFS uses erasure codes for more increased redundancy. Is this something that is required? Can this be transparently implemented at a lower-level?
Data is encrypted with a deterministically derived key from the hash of the content.
Now when you store the same data twice it gets the same identifier but is still encrypted.
Two known attacks on Convergent Encryption
The Confirmation Of A File Attack
If the attacker knows the data being stored, then they can check if the data is already being stored (as the id of encrypted content is known to anybody who has the data).
If data is known except for a small part (entropy is low) it can be brute-forced.
This would be impossible when using a secret key for encrypting the data which is not the hash of the data as the secret key would introduce enough entropy.
Solutions to both is to allow a secret value to be specified that is mixed with the encryption key. If the same secret value is used for the same data it is still gets the same id. This makes sense for users. Users can de-duplicate their own data but it is never de-duplicated with other users data.
Cryptosphere does this with an optional convergence secret.
Such a secret value can be easily implemented with keyed hashing as provided by NaCl (libsodium doc).
Hash tree / Merkle tree
A hash tree (or Merkle tree) is a tree where every node is labelled with a cryptographic hash. The content of the tree (and sub-trees) can be efficiently verified.
This is a basic structure used by all the projects listed below.
How should block size be chosen?
Reasons for small block size (~1kb)
- authenticating every packet: Posted on the boring-crypto mailing list.
- forged packages can be detected quicker (and at lower level)
- modern crypto has no serious performance problems with block sizes of about 1kb
- Reasons GNUnet uses 1kb block sizes
- Works good with UDP
- Encourages replication (larger block size makes it more expensive to replicate)
- Filesystems can be formated with block-size of 1kb
- From the Named Data Networking mailing list:
- NDN uses a maximum package size of 8800 octets. As explained on the list the reason is to fit an entire packet in a pre-allocated buffer. If the buffer is to big a lot of memory is wasted. If packages larger than buffer are allowed additional copying is required.
Reasons for larger block size
- less fragmentation, less overhead
How can it be decided that content is no longer required and space be reclaimed safely?
Tahoe-LAFS has some notes on Garbage Collection.
Do other projects do secure immutable storage? How? Why can we not reuse what they do?
Splits data into blocks (variable size) and builds two parallel hash-trees. One hash tree is accessible with "Read"-capability and holds encryption keys to data blocks.
The parallel "Repair"-tree holds hash of data blocks and can verify presence of content, but can not read content.
Cryptosphere uses NaCl. They use authenticated encryption for data blocks (using
crypto_secretbox), even though content can be authenticated via hash (see discussion).
See description of data model.
An Encoding for Censorship-Resistant Sharing (ECRS, 2008) describes the encoding scheme as implemented in the GNUnet file-sharing application.
Primary requirements are:
- plausible deniability
The scheme is based on Efficient Sharing of Encrypted Data (2002).
Data is split up into blocks of size 32 kilo bytes (called /DBlock/s). DBlocks are encrypted with the hash of the content and identified with the hash of the encrypted content.
To retrieve a DBlock the hash of the encrypted content and the key (hash of plaintext) is required. This pair is called a content hash key (CHK). Freenet uses the same scheme.
Size of hash codes is 512-bit (64bytes). Size of a CHK is 128bytes.
Multiple DBlocks are referenced from an IBlock that also has size 32kB (contains at most 256 CHKs). Resulting IBlocks are encoded in the same way as the DBlocks until there is a single IBlock left, the top IBlock.
In order to find files two additional mechanisms are implemented. However they are not strictly necessary:
While out-of-band communication of those CHK keys is cer-tainly feasible, an integrated solution is clearly desirable.
The two mechanisms for finding files are:
- Namespaces: /SBlock/s that contain the CHK of a top IBlock together with metadata belonging to the content. SBlock is encrypted with the hash of an identifier that can be used to query the content (allowing them to be read only if the identifier is known). Additionally SBlocks contain a digital signature, proving provenance.
- Keyword search: Publish metadata about some content in a KBlock additionally keywords can be specified with which the KBlocks can be discovered without revealing the keyword being queried for to intermediaries. This is done by deterministically generating a public-private key pair from the keyword K and some clever tricks.
See File Encoding for details.
Inspired by Tahoe-LAFS.
An Encoding for Censorship-Resistant Sharing (ECRS, 2008) describes the Freenet scheme in contrast to ECRS.
ERIS Guile implementation
Git repository of Guile implementation.
Data model and Data storage
ERIS was developed as part of a larger research task into data model and data storage.