From 88421763386acdd660a064ea728a75d088030655 Mon Sep 17 00:00:00 2001 From: Mosh <1306020+mishmosh@users.noreply.github.com> Date: Thu, 3 Apr 2025 10:02:35 -0400 Subject: [PATCH 1/4] Create ipip-0000.md --- src/ipips/ipip-0000.md | 102 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 src/ipips/ipip-0000.md diff --git a/src/ipips/ipip-0000.md b/src/ipips/ipip-0000.md new file mode 100644 index 00000000..68fb29b8 --- /dev/null +++ b/src/ipips/ipip-0000.md @@ -0,0 +1,102 @@ +--- +# IPIP number should match its pull request number. After you open a PR, +# please update title and update the filename to `ipip0000`. +title: "IPIP-0000: CID Profiles" +date: 2025-04-03 +ipip: proposal +editors: + - name: Michelle Lee +relatedIssues: + - n/a +order: 0000 +tags: ['ipips'] +--- + +## Summary + + +This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. + +## Motivation + +Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. + +## Detailed design + +We introduce a profile naming system, + +Each profile must specify the following characteristics: + +1. CID version (CIDv0 or CIDv1) +2. Hash algorithm +3. Chunk size +4. DAG width +5. DAG layout +6. Required + +Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. + +| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | +|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| +| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | +| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | +| DAG layout | balanced | balanced | balanced | balanced | not specified | + + + +This would be specified as a table in (forthcoming UnixFS spec). + + + +## Design rationale + +The profile names are chosen to be easy to pronounce. + +Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel: + +| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | +|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| +| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | +| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | +| DAG layout | balanced | balanced | balanced | balanced | not specified | + +* Kubo has 2 different default DAG widths: + * For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256. + * For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174 + +See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ + +### User benefit + +Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations. + +### Compatibility + +Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. + +Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability. + +### Security + +TODO + +### Alternatives + +Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. + +## Test fixtures + +TODO + +List relevant CIDs. Describe how implementations can use them to determine +specification compliance. This section can be skipped if IPIP does not deal +with the way IPFS handles content-addressed data, or the modified specification +file already includes this information. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 4ba68f030e067ea3acaba5514e5d97ba87d535f5 Mon Sep 17 00:00:00 2001 From: Mosh <1306020+mishmosh@users.noreply.github.com> Date: Thu, 3 Apr 2025 10:03:29 -0400 Subject: [PATCH 2/4] Update and rename ipip-0000.md to ipip-0499.md --- src/ipips/{ipip-0000.md => ipip-0499.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename src/ipips/{ipip-0000.md => ipip-0499.md} (99%) diff --git a/src/ipips/ipip-0000.md b/src/ipips/ipip-0499.md similarity index 99% rename from src/ipips/ipip-0000.md rename to src/ipips/ipip-0499.md index 68fb29b8..d1947e2d 100644 --- a/src/ipips/ipip-0000.md +++ b/src/ipips/ipip-0499.md @@ -1,7 +1,7 @@ --- # IPIP number should match its pull request number. After you open a PR, # please update title and update the filename to `ipip0000`. -title: "IPIP-0000: CID Profiles" +title: "IPIP-0499: CID Profiles" date: 2025-04-03 ipip: proposal editors: From 6cc64cb765aaab872793b2bd3b49c7f02c8f14b2 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 15 Apr 2025 23:41:17 +0200 Subject: [PATCH 3/4] add extra attributes proposed in review Co-authored-by: Bumblefudge --- src/ipips/ipip-0499.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index d1947e2d..7f75d728 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -27,11 +27,15 @@ We introduce a profile naming system, Each profile must specify the following characteristics: -1. CID version (CIDv0 or CIDv1) +1. CID version (currently only CIDv0 or CIDv1) 2. Hash algorithm -3. Chunk size -4. DAG width -5. DAG layout +3. UnixFS Chunk size (explicitly set, not contextual/reactive to input) +4. UnixFS directory DAG width +5. UnixFS directory DAG layout +6. HAMT directory DAG threshold +7. HAMT directory DAG width +8. Leaf Envelope (historically dag-pb, now none/raw) +9. Allow empty directories 6. Required Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. @@ -43,7 +47,10 @@ Additional profiles can be added at a future date. Profile names may be chosen f | Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | | DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | | DAG layout | balanced | balanced | balanced | balanced | not specified | - +| HAMT threshold | 256KiB (est) | 256KiB (est) | 1000 **links** | 256KiB | not specified | +| HAMT width | 256 blocks | 256 blocks | 256 blocks | 256 blocks | not specified | +| Leaves | raw | raw | raw | raw | not specified | +| EmptyDirs | allowed | allowed | disallowed | allowed | not specified | This would be specified as a table in (forthcoming UnixFS spec). From d8b83891fdef2e104278a05c085faf8c568b258f Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 16 Apr 2025 00:29:09 +0200 Subject: [PATCH 4/4] incorporate kubo#10774 Import.* config params for controlling DAG width were added in: https://github.com/ipfs/kubo/pull/10774 --- src/ipips/ipip-0499.md | 82 +++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 42 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 7f75d728..648a48ed 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -6,16 +6,19 @@ date: 2025-04-03 ipip: proposal editors: - name: Michelle Lee + github: mishmosh + affiliation: + name: IPFS Foundation relatedIssues: - - n/a -order: 0000 + - https://discuss.ipfs.tech/t/should-we-profile-cids/18507 +order: 0499 tags: ['ipips'] --- ## Summary -This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. +This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. ## Motivation @@ -23,57 +26,43 @@ Currently, CIDs can be generated with a variety of settings and optimizations fo ## Detailed design -We introduce a profile naming system, +We introduce a profile naming system, Each profile must specify the following characteristics: 1. CID version (currently only CIDv0 or CIDv1) -2. Hash algorithm -3. UnixFS Chunk size (explicitly set, not contextual/reactive to input) -4. UnixFS directory DAG width -5. UnixFS directory DAG layout -6. HAMT directory DAG threshold -7. HAMT directory DAG width -8. Leaf Envelope (historically dag-pb, now none/raw) -9. Allow empty directories -6. Required +1. Hash algorithm +1. UnixFS Chunk algorithm (e.g. size-based or content-based) +1. UnixFS directory DAG layout (e.g. balanced, trickle) +1. UnixFS file DAG width (max number of links per `File` node) +1. UnixFS directory DAG width (max number of links per basic `Directory` node) +1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`) +1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node) +1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves) +1. Empty directories (informative suggestion) Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. -| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | -|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| -| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | -| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | -| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | -| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | -| DAG layout | balanced | balanced | balanced | balanced | not specified | -| HAMT threshold | 256KiB (est) | 256KiB (est) | 1000 **links** | 256KiB | not specified | -| HAMT width | 256 blocks | 256 blocks | 256 blocks | 256 blocks | not specified | -| Leaves | raw | raw | raw | raw | not specified | -| EmptyDirs | allowed | allowed | disallowed | allowed | not specified | - - This would be specified as a table in (forthcoming UnixFS spec). - - ## Design rationale -The profile names are chosen to be easy to pronounce. - -Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel: +The profile names are chosen to be easy to pronounce. -| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | -|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| -| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | -| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | -| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | -| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | -| DAG layout | balanced | balanced | balanced | balanced | not specified | +Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel: -* Kubo has 2 different default DAG widths: - * For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256. - * For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174 +| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | +|---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------| +| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | +| Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified | +| Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? | +| Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | +| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | +| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | +| Leaves | raw | raw | raw | raw | raw | not specified | +| Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified | See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ @@ -85,7 +74,7 @@ Reliable, deterministic CIDs allow independent verification of content across to Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. -Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability. +Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. ### Security @@ -95,6 +84,15 @@ TODO Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. + +#### Empty directories + +Decision if empty directories should be included is left out of scope. + +Tools can apply arbitrary filtering before passing filesystem entries +to be converted into a DAG, thus for 1:1 CID reproducibility one should +run without any prefilters, or ensure the same prefilters are applied. + ## Test fixtures TODO