TL;DR WSKDF intentionally limits key‑entropy and uses a slow, memory‑hard KDF (Argon2id) so that:
- 🔑 Deriving the key when you know the preimage is fast (one Argon2id call).
- 🛠️ Recovering the key when the preimage is lost is still feasible by brute‑force, but it takes predictable CPU time that scales with the chosen bit‑length.
- Simple secret – Small enough to jot on paper or share.
- Strong key – run that preimage through one heavy Argon2id pass. One run is quick; billions are costly.
- Recoverable – lose the preimage? Brute‑force time is predictable and set by n bits and Argon2id cost (see table). You decide whether recovery takes days, weeks or months.
- Coercion‑resistant – stash the preimage elsewhere. If forced to hand over the key, you truthfully can’t; an attacker must steal the stash or spend the compute.
Note: this is just an idea, we don't suggest this scheme as it was not peer reviewed and is a very advanced usage
This two-layer scheme provides three distinct recovery paths for the final keyfile:
-
Direct Access (seconds): If you have physical access to the second preimage (e.g., stored in a bank vault), you can derive the second key instantly and decrypt the final keyfile.
-
Computational Recovery (days/weeks): If the second preimage is lost, you can brute-force the 24-bit search space. With sufficient computational resources, recovery is expensive but feasible.
-
Time-Locked Recovery (30 days): Using the first preimage, you must wait for the 30-day derivation to complete, decrypt the second preimage, then derive the second key. This path provides guaranteed access but enforces a significant time delay - crucial for coercion resistance.
Under coercion, even if you provide all materials (both preimages, encrypted files, and parameters), an attacker must either:
- Wait 30 days for the time-locked path, giving authorities time to intervene
- Spend significant resources on computational recovery
- Gain physical access to wherever you've stored the second preimage
See scripts/complex-scheme.sh
for implementation details.
If you're using rustup, it's recommended to add the musl target for static compilation because it's the default target in cargo:
$ rustup target add x86_64-unknown-linux-musl
Then build with the alkali feature flag for best performance:
$ cargo build --release -F alkali
Alternatively, you can use the provided Nix flake which automatically sets up the environment:
$ nix develop
$ cargo build --release -F alkali
This project uses Git hooks for maintaining code quality. When you enter the development environment, hooks are automatically configured:
$ nix develop
📎 Setting up Git hooks for code quality checks...
✅ Git hooks configured automatically!
• pre-commit: Checks code formatting
• pre-push: Runs formatting and clippy checks
The project includes two Git hooks that help maintain code quality:
- pre-commit: Ensures code is properly formatted before committing
- pre-push: Runs both formatting and clippy checks before pushing
These hooks are automatically configured when you enter the nix development shell. To manually configure them:
git config core.hooksPath .githooks
To disable the hooks temporarily:
git config --unset core.hooksPath
You can run the quality checks manually at any time:
# Check formatting
nix develop -c cargo fmt --check
# Fix formatting
nix develop -c cargo fmt
# Run clippy
nix develop -c cargo clippy --workspace -- -D warnings
# Run all checks
nix develop -c cargo fmt --check && nix develop -c cargo clippy --workspace -- -D warnings
Note: salt is a hex encoded string of 16 bytes. It's good enough to generate it once and reuse for multiple keys. You can generate with:
$ cargo run -- generate-salt --output salt
# which is similar to:
$ openssl rand -hex 16
a228c13efadd4f6435a30d62a998d065
In these examples we will use 000102030405060708090a0b0c0d0e0f
as salt.
Note: for real-world usage we recommend using a larger bit-length (e.g. 20).
$ cargo run --release -F alkali -- output-random-key -n 4 --preimage-output preimage --key-output key --salt-input salt
$ cat preimage
000000000000000e
$ cat key
6f95db5eec10b1cd3ef6afc7e3163a2a4a935ce602375b787dbc5f0f06df50aa
# Now we can use the key to encrypt a file
$ cat key | gpg --verbose --symmetric --batch --passphrase-fd 0 --cipher-algo AES256 LICENSE
gpg: enabled compatibility flags:
gpg: using cipher AES256.CFB
gpg: writing to 'LICENSE.gpg'
$ file LICENSE.gpg
LICENSE.gpg: PGP symmetric key encrypted data - AES with 256-bit key salted & iterated - SHA256 .
Now suppose we lost the preimage and the key. We can recover them using the external command.
In this example we both find the key/preimage and decrypt the file, see scripts/gpg_decrypt.sh
for the implementation.
$ INPUT_FILE=LICENSE.gpg OUTPUT_FILE=/tmp/LICENSE cargo run --release -F alkali -- find-key --command ./scripts/gpg_decrypt.sh -t 4 -n 4 --preimage-output found-preimage --key-output found-key --salt-input salt
Using 4 rayon threads
Starting parallel search
Deriving key for 000000000000000e
Deriving key for 000000000000000a
Deriving key for 000000000000000c
Deriving key for 0000000000000008
Trying to decrypt LICENSE.gpg with key 620522780b9448642f40e1d5f792d8902dd376e302d16c820403d571c95eda7f
Deriving key for 000000000000000b
Trying to decrypt LICENSE.gpg with key 6f95db5eec10b1cd3ef6afc7e3163a2a4a935ce602375b787dbc5f0f06df50aa
Found key!
Trying to decrypt LICENSE.gpg with key 5f1fdf16c1cbd2b559a38d3c113deed004c3ade44227cf03dbbd4dc6ddad0e2c
Trying to decrypt LICENSE.gpg with key f4443f057ebbb2649b0d4a54bb272ce9979326f360bd589584c678f2b9f1df0b
Trying to decrypt LICENSE.gpg with key 268a91ceda464f5fe70f87601a84c821b38ab2a06d796ac183da7fd5ff0ed403
$ wc LICENSE /tmp/LICENSE
21 168 1064 LICENSE
21 168 1064 /tmp/LICENSE
42 336 2128 total
$ cat found-preimage
000000000000000e
$ cat found-key
6f95db5eec10b1cd3ef6afc7e3163a2a4a935ce602375b787dbc5f0f06df50aa
All commands share the Argon2id cost flags. For release mode we have:
--ops-limit <OPS_LIMIT> [default: 7] (iterations)
--mem-limit-kbytes <MEM_LIMIT_KBYTES> [default: 4194304] (4 GiB)
⚠️ The defaults make a single derivation take ~30s on a typical desktop CPU using all cores. Seebenchmark
command below for better estimates on your hardware.
Search space: For n-bit preimages, there are 2^(n-1) possible candidates (MSB always 1).
Expected trials:
- Systematic search: 2^(n-2) trials (exactly half the space)
- Random search: 2^(n-1) trials (due to replacement/duplicates)
Wall-clock time = (Expected trials × Time per trial) / Number of threads
The table below shows realistic scenarios:
- Desktop (16 threads): Limited by available cores
- Cluster (2048 threads): Limited by coordination overhead
Assumptions
-
Preimages are uniformly from [2n-1, 2n), i.e. the most‑significant bit is always 1. Every candidate truly has n bits; the search‑space size is therefore 2n‑1
-
Each candidate costs 30s to evaluate (Argon2id with the default cost).
-
Two hardware budgets:
- 🖥️ 16 threads (e.g. 16-core/64GB RAM desktop machine)
- 🏭 2048 threads (e.g. 64×32-core/128GB RAM machines on some cloud provider)
Bits | 16 threads 🖥️ (systematic search) |
2048 threads 🏭 (systematic search) |
2048 threads 🏭 (random search) |
||
---|---|---|---|---|---|
Worst-case time | Worst-case time | Expected time | 99th percentile | 99.9th percentile | |
1-5 | 30 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
6 | 1 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
7 | 2 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
8 | 4 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
9 | 8 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
10 | 16 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
11 | 32 min 0 s | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
12 | 1 h 4 min | 30 s | 30 s | 2 min 18 s | 3 min 27 s |
13 | 2 h 8 min | 1 min 0 s | 1 min 0 s | 4 min 36 s | 6 min 54 s |
14 | 4 h 16 min | 2 min 0 s | 2 min 0 s | 9 min 13 s | 13 min 49 s |
15 | 8 h 32 min | 4 min 0 s | 4 min 0 s | 18 min 25 s | 27 min 38 s |
16 | 17 h 4 min | 8 min 0 s | 8 min 0 s | 36 min 50 s | 55 min 16 s |
17 | 1 d 10 h | 16 min 0 s | 16 min 0 s | 1 h 14 min | 1 h 51 min |
18 | 2 d 20 h | 32 min 0 s | 32 min 0 s | 2 h 27 min | 3 h 41 min |
19 | 5 d 17 h | 1 h 4 min | 1 h 4 min | 4 h 55 min | 7 h 22 min |
20 | 11 d 9 h | 2 h 8 min | 2 h 8 min | 9 h 49 min | 14 h 44 min |
21 | 22 d 18 h | 4 h 16 min | 4 h 16 min | 19 h 39 min | 1 d 5 h |
22 | 45 d 12 h | 8 h 32 min | 8 h 32 min | 1 d 15 h | 2 d 11 h |
23 | 91 d 0 h | 17 h 4 min | 17 h 4 min | 3 d 7 h | 4 d 22 h |
24 | 182 d 0 h | 1 d 10 h | 1 d 10 h | 6 d 13 h | 9 d 20 h |
25 | 364 d 0 h | 2 d 20 h | 2 d 20 h | 13 d 2 h | 19 d 16 h |
26 | 1 y 363 d | 5 d 17 h | 5 d 17 h | 26 d 5 h | 39 d 7 h |
27 | 3 y 361 d | 11 d 9 h | 11 d 9 h | 52 d 10 h | 78 d 14 h |
28 | 7 y 358 d | 22 d 18 h | 22 d 18 h | 104 d 19 h | 157 d 5 h |
29 | 15 y 351 d | 45 d 12 h | 45 d 12 h | 209 d 14 h | 314 d 9 h |
30 | 31 y 338 d | 91 d 1 h | 91 d 1 h | 1 y 54 d | 1 y 264 d |
31 | 63 y 311 d | 182 d 1 h | 182 d 1 h | 2 y 108 d | 3 y 163 d |
32 | 127 y 257 d | 364 d 2 h | 364 d 2 h | 4 y 217 d | 6 y 325 d |
Random search follows a geometric distribution with high variance. For planning purposes, consider the 99th percentile times shown in the table above to understand worst-case scenarios.
Interpretation
-
Single machine (16 threads): Systematic search partitions the space evenly among threads, eliminating duplicate work. Each thread searches 1/16th of the total space. The expected time to find a key is when half the total space has been searched.
-
Cluster (2048 threads): Random search where threads independently select candidates. Despite occasional duplicate work, the 128× increase in threads (2048 vs 16) results in much faster wall-clock time.
-
Key insight: For the same number of threads, systematic search would complete in half the expected time of random search (due to no duplicates). However, the table compares different thread counts to show realistic deployment scenarios.
The following is an example output. Run this command on your own hardware to get accurate time estimates for your machine.
$ cargo run --release -F alkali -- benchmark -i 1 -t 16
Using 16 threads for benchmark
Starting benchmark with 1 iterations across 16 threads...
Benchmark results:
Threads: 16
Total time: 29.63s
Total iterations: 16
Global average time per derivation: 1851.87ms
Global derivations per second: 0.54
Thread average time per derivation: 29.63s
Thread derivations per second: 0.03
Estimated time to brute-force with measured derivation time:
Average derivation time: 29.63s
Thread count: 16
bits │ systematic-16t │ systematic-16t │ random-16t │ random-16t │ random-16t
│ (expected) │ (worst case) │ (expected)│ (99th %) │ (99.9th %)
-----┼----------------┼----------------┼-----------┼-----------┼------------
1 │ 30s │ 30s │ 30s │ 2min 16s │ 3min 25s
2 │ 30s │ 30s │ 30s │ 2min 16s │ 3min 25s
3 │ 30s │ 30s │ 30s │ 2min 16s │ 3min 25s
4 │ 30s │ 30s │ 30s │ 2min 16s │ 3min 25s
5 │ 30s │ 30s │ 30s │ 2min 16s │ 3min 25s
6 │ 30s │ 59s │ 59s │ 4min 33s │ 6min 49s
7 │ 59s │ 1min 59s │ 1min 59s │ 9min 6s │ 13min 39s
8 │ 1min 59s │ 3min 57s │ 3min 57s │ 18min 12s │ 27min 17s
9 │ 3min 57s │ 7min 54s │ 7min 54s │ 36min 23s │ 54min 35s
10 │ 7min 54s │ 15min 48s │ 15min 48s │ 1h 13min │ 1h 49min
11 │ 15min 48s │ 31min 36s │ 31min 36s │ 2h 26min │ 3h 38min
12 │ 31min 36s │ 1h 3min │ 1h 3min │ 4h 51min │ 7h 17min
13 │ 1h 3min │ 2h 6min │ 2h 6min │ 9h 42min │ 14h 33min
14 │ 2h 6min │ 4h 13min │ 4h 13min │ 19h 24min │ 1d 5h
15 │ 4h 13min │ 8h 26min │ 8h 26min │ 1d 15h │ 2d 10h
16 │ 8h 26min │ 16h 51min │ 16h 51min │ 3d 6h │ 4d 20h
17 │ 16h 51min │ 1d 10h │ 1d 10h │ 6d 11h │ 9d 17h
18 │ 1d 10h │ 2d 19h │ 2d 19h │ 12d 23h │ 19d 10h
19 │ 2d 19h │ 5d 15h │ 5d 15h │ 25d 21h │ 38d 20h
20 │ 5d 15h │ 11d 6h │ 11d 6h │ 51d 18h │ 77d 15h
21 │ 11d 6h │ 22d 11h │ 22d 11h │ 103d 12h │ 155d 6h
22 │ 22d 11h │ 44d 23h │ 44d 23h │ 207d 0h │ 310d 12h
23 │ 44d 23h │ 89d 22h │ 89d 22h │ 1y 49d │ 1y 256d
24 │ 89d 22h │ 179d 19h │ 179d 19h │ 2y 98d │ 3y 147d
25 │ 179d 19h │ 359d 14h │ 359d 14h │ 4y 196d │ 6y 294d
26 │ 359d 14h │ 1y 354d │ 1y 354d │ 9y 27d │ 13y 223d
27 │ 1y 354d │ 3y 343d │ 3y 343d │ 18y 54d │ 27y 81d
28 │ 3y 343d │ 7y 322d │ 7y 322d │ 36y 108d │ 54y 162d
29 │ 7y 322d │ 15y 279d │ 15y 279d │ 72y 216d │ 108y 324d
30 │ 15y 279d │ 31y 192d │ 31y 192d │ 145y 67d │ 217y 283d
31 │ 31y 192d │ 63y 19d │ 63y 19d │ 290y 135d │ 435y 202d
32 │ 63y 19d │ 126y 38d │ 126y 38d │ 580y 269d │ 871y 39d
Explanation:
• Systematic (expected): Average case with 16 threads, each searching half their partition
• Systematic (worst): One thread searches entire partition of 2^(n-1) / 16 candidates
• Random (expected): 16 threads with expected 2^(n-1) / 16 trials per thread
• Random (99th %): 99% chance completion is faster than this
• Random (99.9th %): 99.9% chance completion is faster than this
The scripts/complex-scheme.sh
example uses these parameters for the second layer:
- Bit length: 26 bits (67,108,864 possible values, but only 33,554,432 candidates since MSB=1)
- Argon2id parameters: 7 iterations, 4GB memory
- Expected derivation time: ~30 seconds per attempt (on typical hardware)
From the benchmark table above, brute-forcing a 26-bit preimage requires:
- 16 threads (desktop): 1 year 363 days worst-case, 364 days expected
- 128 threads (single large instance): ~91 days expected
- 2048 threads (distributed cluster): 5.7 days expected, 26 days at 99th percentile
For memory-hard operations requiring 4GB per thread:
Option 1: High-Memory Instances
- Instance type:
r6i.32xlarge
(128 vCPUs, 1024 GB RAM) - Can run 128 parallel threads (1 per vCPU, each using 4GB RAM)
- Cost: ~$8.06/hour
- Time needed: ~91 days (with 128 threads)
- Total cost: ~$17,600
Option 2: Compute-Optimized Cluster
- Instance type:
c6i.4xlarge
(16 vCPUs, 32 GB RAM) - Can run 8 parallel threads (limited by RAM: 32GB/4GB = 8)
- Cost: ~$0.68/hour per instance
- Need 256 instances for 2048 threads
- Time needed: ~5.7 days expected
- Total cost: ~$23,800 (expected case)
- Total cost: ~$108,500 (99th percentile, 26 days)
Option 3: Spot Instances
- Using spot pricing can reduce costs by 60-90%
- Less reliable, may be interrupted
- Estimated cost: $4,000-$40,000 depending on availability
Google Cloud Platform
n2-highmem-128
(128 vCPUs, 864 GB RAM)- Can run 128 parallel threads (1 per vCPU)
- Cost: ~$6.74/hour
- Time needed: ~91 days
- Total cost: ~$14,700
Local Hardware Investment
- 64-core AMD Threadripper: ~$4,000
- 256GB RAM: ~$1,000
- Can run 64 threads continuously
- Time: ~182 days (6 months)
- One-time cost: ~$5,000 (reusable hardware)
- CPU vs Memory Constraints: Argon2id is CPU-intensive; you can only run one thread per CPU core effectively
- Memory Requirements: Each thread needs 4GB RAM, which can limit thread count on lower-memory instances
- Spot vs On-Demand: Spot instances can reduce costs by 60-90% but may be interrupted
- Coordination Overhead: Managing 2048 threads across 256+ machines requires significant orchestration
- Electricity Costs: For local hardware, add ~$800-2,000 for 182 days of operation
Realistic cost estimates for brute-forcing a 26-bit preimage:
- Budget approach: $5,000-$20,000 using local hardware or spot instances
- Fast approach: $24,000-$110,000 for on-demand cloud computing
- Worst case: Higher costs if extremely unlucky (99.9th percentile)
These costs make brute-force recovery feasible for high-value assets while remaining prohibitively expensive for casual attackers. The actual cost depends heavily on:
- Current cloud pricing
- Luck in finding the preimage
- Available optimization techniques
- Whether time or money is the primary constraint