Asuran Internals
1. General Format Notes
1.1. Serialization
Unless otherwise specified, all data types are serialized with serde-rmp using compact (array) format.
1.2. Endianness
All serialized encodings of integers are in network byte order (big-endian).
1.3. Locking
Locking of file is accomplished by writing a file.lock file containing a process specific UUID, and reading the file back to verify the UUID to ensure proper ownership of the lock.
Collision management is implemented with binary exponential backoff.
1.3.1. Readlocks
A special readlocks subfolder should be used to hold files containing the process specific UUIDs of the asuran implementations currently reading a repository. The names of these files are the string-encoded UUIDs, and the contents are the binary UUID. These files should be deleted on repository close
1.3.2. Global lock
In the case an operation, such as a compaction, needs to ensure that nobody else is using the repository, a single lock file called “lock” may be used in the root folder of the repository.
This flag indicates that the repository may be in an inconsistent state, and no other processes may read or write to the repository while this flag is in place.
No process may take a global lock while any process currently holds a readlock.
2. Repository
The repository is a low level key-value store that is commited to disk. All values (refered to as chunks“) in the repository are encrypted, compressed, and HMACed with algorithms configurable on a per-value basis. The repository only understands keys and values, and effectively operates as content addressable storage, all other data structures are implemented on top of the repository.
The repository structure itself is storage-independent. The repository object itself simply views the world as list of segments, which themselves are lists of sized cells containing values.
Repositories are not strictly required to have multiple segments, and segments are not strictly required to contain multiple chunks. This allows simple mapping of any random access storage as possibly) a single segment, or an object type store (such as S3) as a number of segments each containing one or many chunks.
The repository has special methods for pulling the manifest and index out of itself, and it may or may not treat these pieces of data as special, depending on the backend implementation in use. Typically, the manifest will be stored as a normal Chunk with a special key that is all zero.
2.1. On Disk Format
Asuran is meant to support a variety of storage types, some of which may or may not align with the traditional file system layout, such as S3, which lacks the traditional folder structure, or RocksDB, which lacks structure beyond a key-value store entirely.
Two primary archive formats are being developed targeted at traditional file system (i.e. on a hard disk or SSD), and these should serve as the standards by which other storage backends are written. As much as is possible, alternative backends/formats should mimic one of the file system based ones. If a tool exists for copying files to/from a type of storage (such as sftp or rclone), the end user should be able to use that utility to copy the repository onto their local storage from the remote storage verbatim, and, without having to do anything else, continue using the repository via the matching local storage archive format without any issue.
From the level of the exposed API, the asuran repository acts as a key-value store. Data slices are optionally compressed and encrypted before being commited to storage, and an HMAC tag of the encrypted + compressed slice is stored along side to allow data verification. Objects are referenced within the repository by a key that will, in almost all cases 1, be an HMAC of its plain text contents2.
2.1.1. Chunks and Delta Chunks
A data slice in a repository can be stored in one of two containers, chunks and delta chunks.
The chunk is the “basic” data type of the repository and, for the most part, the only type the consumer of the API will interact with. Delta Chunks exist as an optimization, and are converted into regular chunk objects after being read out of the repository and before being returned by the repository methods.
2.1.1.1. Chunks
Chunks are a data structure containing a encrypted and compressed data slice. They also contain tags defining the encryption type and IV, the type of compression used, as well as the HMAC and ID tags.
2.1.1.2. Delta Chunks
Delta chunks are a data structure that contain encrypted, compressed data intended to be used as a patch to another chunk, to be used for delta-compression with the rsync algorithm. Like regular chunks, they contain data, an HMAC tag, an encryption type/IV tag, and have their own chunk ID. Unlike regular chunks, they can not be directly unpacked in any meaningful sense, and thus also contain a reference to another chunk (by its ChunkID) whose data it patches.
Delta Chunk Chains
Delta chunks are allowed to reference other delta chunks. In this event, it is understood that the actual data being referenced is the fully patched and unpacked data, not the patch contained within.
Delta chunk chains can, in principal, be any length long, but as that will seriously hamper restore performance, it is recommended that implementations go to reasonable lengths to keep these chains short. One suggested method, at the cost of CPU at archive creation time, is to walk up the chain from the root normal chunk, computing a delta patch each step of the way, and to only keep the patch that is some desirable combination of closest to the root and smallest.
Chain repacking
During a repository compaction operation, the consumer application can optionally recreate the chunk chain so that the newest data is in the root normal chunk, changing the direction of the chain so that data from further back in time is further away from the chain root.
2.1.2. Repository Operations
Asuran repositories are fundamentally append-only logs of transactions. Compaction (deletion of unreachable/unneeded entries) must occur through parsing of the log, and rewriting only the entries that are still reachable and have not been explicitly deleted.
Repository formats should, but do not necessarily need to, keep an separate index, such as a serialized hash-table, or a key-value DB such as cask, rocksdb, or unqlite, that directly maps object keys to their location on disk, to allow quick lookup operations without having to parse the entire repository on startup.
Reading from the repository is not synchronized and does not use the transaction mechanism directly.
2.1.2.1. Transactional operations
The majority of all interactions with the repository will occur through the normal transactions.
Insert
An insert transaction commits a chunk to the repository. In the event that an insert is issued for a key that already exists in the repository, the newer value overwrites the previous, and the previous value is now considered unreachable. This should be done with extreme care, and the only object that should have this done as a matter of course is the manifest.
Delete
A delete transaction marks the chunk referred to by the specified key as unreachable, and removes it from the index. The chunk itself will not be removed from the repository until a compaction or repack operation is performed.
2.1.2.2. Non-Transactional operations
Some operations on a repository are non transactions, mainly maintenance operations.
Compaction
A compaction operation identifies segments with unreachable transactions, and recreates them without the unreachable transactions in them, and then updates the locations in the index.
A properly written compaction operation will not delete the old segments until the new ones have been written and finalized.
Repack
A repack operation iterates through all reachable transactions in a repository, and recommits them to a different backend. This function is mainly intended to create copies of a repository and for converting between different formats/backends.
Verify
A verify operation iterates though all the transactions in a repository, verifies their contents, and regenerates the index as it walks through.
2.2. Repository Formats
Asuran supports two primary on disk formats, with different purposes in mind
2.2.1. Flatfile
The flat file format is implemented as a single file log of transactions, with no limits on file size. It has no real support for indexing, so the file must be scanned when loaded. The structure has been designed to facilitate this, but it still may be slow.
This file format is mainly intended to be an intermediate format for weird conversions or used for medium to long term archival of data that doesn’t change often or at all.
While compaction operations can still occur with flat files, they are exceedingly inefficient as the entire repository has to be rewritten to a new file. Do not use this format if you are planning
Flat files have a chunk size limit of u32::MAX bytes, but this shouldn’t be an issue as most of the time the chunk size will be in the kilobytes
2.2.1.1. Header
Each flatfile repository has a header which is a compact message-pack representation of the following struct:
/// The header used by flatfile repositories #[derive(Serialize, Deserialize, Debug)] pub struct Header { /// The magic number, ASCII 'ASURAN_F' pub magic_number: [u8; 8], /// The UUID of the implemenation that wrote this flatfile pub implementation_uuid: [u8; 16], pub semver_major: u16, pub semver_minor: u16, pub semver_patch: u16, pub configuration: Configuration, }
Configuration Struct
The configuration struct is defined as follows:
/// The configuration struct used by flatfile repositories #[derive(Serialize, Deserialize, Debug)] pub struct Configuration { pub key: EncryptedKey, pub chunk_settings: ChunkSettings, }
2.2.1.2. Transactions
The transactions are encoded with the following enum. Chunks are included in the transaction that define them.
/// Represents the various types of transaction entries in a flatfile repository #[derive(Serialize, Deserialize)] pub enum FlatFileTransaction { /// An insertion of a data chunk into the repository Insert { id: ChunkID, chunk: Chunk }, /// A deletion of a data chunk from the repository Delete { id: ChunkID }, /// An insertion of an `Archive` into the repository /// /// The Serialized `Archive` is stored as a `Chunk` containing the serialized Arhive ManifestInsert { id: ChunkID, name: String, timestamp: DateTime<FixedOffset>, }, }
Manifest insertion transactions indicate that the following chunk contains an archive entry within the manifest, see the section on the flatfile manifest.
2.2.1.3. Manifest
Not having seperate files to store things in means the flatfile repository does not have a manifest as such. Instead the manifest is reconstructed during the scan through phase of repository loading based on the Manifest Insertion transactions.
The timestamp of last modification is reconstructed by using the timestamp in the last archive.
Chunk Settings are reconstructed by using the settings of the first chunk inserted in the repository.
2.2.2. Multi File
For all intents and purposes, this is the default backend for asuran. It stores repository on disk, in collections of segment files which contain concatenated lists of transactions. Transactions consist of the transaction enum3 in messagepack format followed by, in the case of an insert, the chunk in messagepack format.
Segment files are stored in numbered directories, with a user configurable limit to the number of segment files stored in each directory.
The multi file format additionally stores a seperate set of index files in a index folder, recording transactions and the locations of the associated chunks, as well as a similar manifest log folder.
2.2.2.1. Segments
Segments are stored in numbered subfolders of the main repository folder. The maximum number of segments per folder, as well as the maximum size of a segment, are user configurable.
Folder names are the number of the lowest segment they contain.
Segment numbers increment from 0, with the lowest unassigned number being used when a new one needs to be created.
Segment files contain a binary format header, followed by the concatenation of the encoded chunks.
Header
Each segment begins with the following header
Description | Byte Range | Value |
---|---|---|
Magic Number | 0..7 | ASURAN_S |
Implementation UUID | 8..24 | UUID of the asuran implementation that wrote the segment. Does not change between versions. |
Semver Major (u16) | 25..26 | Major version of last Asuran to write to segment |
Semver Minor (u16) | 26..27 | Minor version of last Asuran to write to segment |
Semver Patch (u16) | 28..29 | Patch version of last Asuran to write to segment |
Chunks
Chunks are encoded inside the transactions that insert them, with the transaction type enum and struct defined as follows:
#[derive(Copy, PartialEq, Eq, Clone, Serialize, Deserialize, Debug)] pub enum TransactionType { Insert, Delete, } /// Struct used to store a transaction inside a segment. /// /// TODO: Change this to an enum #[derive(Clone, PartialEq, Eq, Debug, Serialize, Deserialize)] pub struct Transaction { tx_type: TransactionType, id: ChunkID, chunk: Option<Chunk>, }
2.2.2.2. Index
The index is a collection of numbered log files, stored in a sub folder of the repository called “index”, logging insertions, but only keeping track of the location of the chunks4. Deletions are not processed in the transaction, and the only way to remove an object from the index is when the index is recreated during a
On start up, Asuran loads all the available index files and parses through them, replaying history to build the in-memory index.
Transactions
Transactions in an index file are stored as the following struct:
/// Struct containing the various parts of a transaction #[derive(Clone, PartialEq, Eq, Debug, Serialize, Deserialize)] pub struct IndexTransaction { /// ID of the `Chunk` this transaction refers to pub chunk_id: ChunkID, /// The location of this `Chunk` on disk pub descriptor: SegmentDescriptor, }
Locking
An index file must be locked when transactions are being written to it.
Locking is accomplished by writing a file called N.lock5 in the index directory, containing a unique process UUID. The file is then read and the UUID compared, to ensure that this instance is the true owner of the lock. The lowest numbered unlocked index file should be the one written to.
Conflict resolution
As writing the same block to the repository twice is an idempotent operation6, inadvertently storing blocks twice has no effect7 other than a hit to deduplication that will be cleared up with a compaction.
Implementations should considered lower-numbered index files to be more authoritative than higher-numbered index files.
2.2.2.3. Manifest
A manifest is simply a list of pointers to archive objects within the repository, accompanied by a set of default chunk settings and a last modified date. The archive pointers are stored in a collection of numbered files in the manifest subfolder of the repository, in a much similar manner to the index. Reads are not synchronized, but writes are, in a manner identical to that used by the index.
Header
Transaction file Each Transaction File is prefixed with the following header:
Name Byte Range Value Magic Number 0..7 ASURAN_M Chunk Settings File
Name Byte Range Value Magic Number 0..7 ASURAN_C
Transactions
Manifest transactions are compact messagepack encoded values of the following struct:
/// Describes a transaction in a manifest #[derive(Clone, PartialEq, Eq, Debug, Serialize, Deserialize)] pub struct ManifestTransaction { /// The HMACs of all previous branch heads in the repository that this transaction references previous_heads: Vec<ManifestID>, /// The location of the archive this trasnaction refrences within the archive pointer: ChunkID, /// The timestamp of this Transactions Creation timestamp: DateTime<FixedOffset>, /// The human readable name of the archive name: String, /// A 128-bit random nonce /// /// This is canonically stored as an array of bytes, to keep the serializer and /// deserializer simple, while preventing issues with other platforms who may not /// have support for the exact same integer types as rust /// /// This value is used for ensuring uniqueness when constructing the Manifest /// Merkle Tree nonce: [u8; 16], /// The type of HMAC used for this transaction hmac: HMAC, /// The HMAC tag of this transaction /// /// This is calculated based off the compact (array form) messagepacked encoding of /// this struct with this value set to all zeros tag: ManifestID, }
Authentication of Manifest transaction entries is achieved through a chain of HMAC tags tracing each archive chain back to the empty archive.
The HMAC used for this construction is specified in the chunk settings file, separately from the one used for chunk operations, though it is normally the same, and defaults to blake2b. The output of the selected HMAC function is truncated or padded as necessary to reach 256 bits in length.
The key used for this HMAC operation is the same as the repository’s hmac_key.
The HMAC of a transaction entry is calculated with the HMAC field set to all zeros, with all the other fields filled out.
As the manifest is effectively an append only log, and a nonce is used that ensure uniqueness, the merge operation for various branches is trivial. As such, a transaction entry should reference the HMACs of the heads of all the branches that the implementation is aware of and can verify, so that when the transaction is used as an entry point, it provides a merged view of all the previous transactions.
Chunk Settings
Default chunk settings are simply stored inside a chunk.settings file in the manifest directory, simply containing a messagepack encoded Asuran::repository::ChunkSettings struct.
The chunk settings struct, as it should not be modified with any substantial frequency, is not transactional, and must be modified through a lock and replace operation.
2.2.2.4. Key
The key material for the repository is stored in a file called “key” in the root folder of the repository. It is encrypted using a key derived from a user supplied passphrase via argon2id. This is stored in a compact format messagepacked Asuran::repository::EncryptedKey struct, which has the following fields:
/// Stores the key, encrypted with another key dervied from the user specified /// password/passphrase /// /// Uses argon2 to derive the key encryption key from the user supplied key. /// /// Uses a 32 byte salt that is randomly generated /// /// Currently uses semi-arbitrary defaults for some values. TODO: allow configuration of this #[derive(Serialize, Deserialize, Clone, Debug)] pub struct EncryptedKey { encrypted_bytes: Vec<u8>, salt: [u8; 32], mem_cost: u32, time_cost: u32, encryption: Encryption, }
3. Manifest
The manifest itself is implemented as a trait supplied by the repository backend (See the multifile manifest for implementation details). It can conceptually be viewed a simple list of pointers to archives within the repository.
This is essentially the root object of the repository, all normal interactions occur through it.
Chunks which are not pointed to by any object reachable through the manifest are considered unreachable and are candidates for removal in a compaction operation.
3.1. Archive
An archive within a repository is stored as a compact messagepacked representation of the following struct:
/// An Archive, as stored in the repository #[derive(Serialize, Deserialize)] pub struct Archive { /// The user provided name of the archive pub name: String, /// The list of objects in this archive, as well as the chunks that make them up pub objects: HashMap<String, Vec<ChunkLocation>>, /// The namespace this archive is currently viewing pub namespace: Vec<String>, /// The timestamp of the archive's creation pub timestamp: DateTime<FixedOffset>, /// The listing of objects in the repository, maintaining their relative structure, /// such as the layout of directories and folders. pub listing: Listing, }
3.1.1. Namespaces and Object Addressing
Object references in the archive take the form of namespace::path.
3.1.1.1. Namespaces
Namespaces take the form of colon delimited lists of strings, sorted in order of least specific to most specific. The strings may take any value, and the meaning is prescribed by the target implementation, but they may only contain alphanumeric characters.
A terminating double colon is append to the end of the namespace specifier. This marks the end of the namespace portion of the specifier, and any data after it will be treated as part of the path.
The default/root namespace is simply “::”
3.1.1.2. Paths
Path specifiers are posix style paths, with no additional restrictions imposed.
3.1.2. Object map
The objects map is stored as a map from complete object references (namespace::path) to an array of the ChunkLocations making up that object.
3.1.2.1. ChunkLocation
ChunkLocation is a struct, serialized in compact messagepack format, with the following fields:
/// A pointer to a `Chunk`, annotated with information on what part of the object it /// makes up #[derive(Serialize, Deserialize, Copy, Clone, Eq, PartialEq, Debug)] pub struct ChunkLocation { pub id: ChunkID, pub start: u64, pub length: u64, }
3.1.3. Timestamp
The creation timestamp on an archive is stored internally as a DateTime<FixedOffset> from the chrono library. It is serialized in the standard RFC3999 format.
3.1.4. Adjacency List
The adjacency list is a graph structure describing the locations of the chunks in the archive relative to each other.
This is used for determining which chunks that non-duplicates are adjacent to for smart application of delta compression.
Footnotes:
There is one exception to this rule, the repository manifest, which has a special key consisting of all 0 bytes.
The key used for this HMAC operation is different from the one used for encryption verification.
Asuran::repository::Transaction
Represented by the number of the segment file containing the chunk, and the offset of the start of the chunk within the file.
where N is the numerical ID of the file to be locked
With the exception of the case of the special all zero manifest key, but this is typically managed outside the normal index setup
Storing two chunks with the same key but non-identical content, with the exception of the manifest, is explicitly designed to result in undefined behavior.