AcornDiskWorker Storage
Filesystem-backed content-addressable storage with Bloom filter fast rejection, POSIX I/O, SHA-256 integrity verification, and state persistence.
.package(url: "https://github.com/treehauslabs/AcornDiskWorker.git", from: "1.0.0")
DiskCASWorker
actor AcornCASWorker
Generic over FileSystemProvider for testability. Use the convenience initializer for production (defaults to DefaultFileSystem).
public actor DiskCASWorker<F: FileSystemProvider>: AcornCASWorker {
public init(
directory: URL,
capacity: Int? = nil,
maxBytes: Int? = nil,
halfLife: Duration = .seconds(300),
sampleSize: Int = 5,
timeout: Duration? = nil,
verifyReads: Bool = true
) throws
}
Parameters
Directory Layout
Files are sharded by the first two hex characters of the CID into 256 subdirectories:
Methods
| Method | Behavior |
|---|---|
has(cid:) |
Bloom filter check first (~80ns for definite miss). Falls back to access() syscall on Bloom "maybe". |
getLocal(cid:) async |
Bloom filter → read file → optional SHA-256 verify → return data. Auto-deletes corrupted files. |
storeLocal(cid:data:) async |
Write to temp file → atomic rename (POSIX). Triggers LFU eviction if needed. Updates Bloom filter. |
delete(cid:) |
Remove file from disk. Update cache, size tracking, and metrics. |
persistState() |
Serialize Bloom filter to .bloom and item sizes to .sizes. Enables fast restart. |
Properties
| Property | Type | Description |
|---|---|---|
metrics | CASMetrics | Hits, misses, stores, evictions, deletions, corruption detections |
totalBytes | Int | Running total of all stored data on disk |
Bloom Filter
DiskCASWorker uses a Bloom filter to avoid unnecessary filesystem calls. For CIDs that definitely don't exist on disk, the Bloom filter returns false in ~80 nanoseconds — avoiding a ~100µs disk seek.
- False positive rate: 1% (configurable internally)
- Hash functions: Double hashing with FNV-1a + MurmurHash3
- Persistence: Serializable to disk via
persistState() - Initialization: Loaded from
.bloomfile on init, or rebuilt by scanning existing files
access() syscall, but never data loss.
Integrity Verification
When verifyReads is true (the default), every getLocal() recomputes the SHA-256 hash of the file contents and compares it to the CID. If they don't match:
- The corrupted file is deleted from disk
metrics.corruptionDetectionsis incremented- The method returns
nil(as if the data doesn't exist)
This catches bit rot, incomplete writes, and filesystem corruption automatically.
Atomic Writes
Stores use the POSIX temp-file + rename pattern:
- Write data to a temporary file in the shard directory
- Call
rename()to atomically move it to the final path
This guarantees that getLocal() never reads a partially-written file, even during crashes.
FileSystemProvider Protocol
DiskCASWorker is generic over filesystem implementation for testability:
public protocol FileSystemProvider: Sendable {
func fileExists(atPath: String) -> Bool
func createDirectory(atPath: String) throws
func contentsOfFile(atPath: String) throws -> Data
func writeFile(_ data: Data, toPath: String) throws
func removeItem(atPath: String) throws
func contentsOfDirectory(atPath: String) throws -> [String]
func fileSize(atPath: String) -> Int?
}
DefaultFileSystem
The production implementation uses raw POSIX syscalls (open, read, write, rename, unlink, stat, access) for maximum performance, bypassing Foundation's file I/O overhead.
State Persistence
persistState() writes two files to the cache directory:
.bloom: Binary-serialized Bloom filter. Avoids O(n) filesystem scan on next init..sizes: JSON dictionary mapping CID strings to byte sizes. RestorestotalBytestracking.
If these files exist on init, the worker loads them in O(1). If missing, it falls back to scanning all shard directories.
CASMetrics
public struct CASMetrics: Sendable, Equatable {
public var hits: Int
public var misses: Int
public var stores: Int
public var evictions: Int
public var deletions: Int
public var corruptionDetections: Int
}
DiskWorker's CASMetrics includes an additional corruptionDetections field not present in MemoryWorker's version.