-
Notifications
You must be signed in to change notification settings - Fork 238
IPIP-0445: Option to Skip Raw Blocks in Gateway Responses #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
--- | ||
title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses" | ||
date: 2023-10-09 | ||
ipip: open | ||
editors: | ||
- name: Hugo Valtier | ||
github: Jorropo | ||
url: https://jorropo.net/ | ||
affiliation: | ||
name: Protocol Labs | ||
url: https://protocol.ai/ | ||
- name: Marcin Rataj | ||
github: lidel | ||
url: https://lidel.org/ | ||
affiliation: | ||
name: Protocol Labs | ||
url: https://protocol.ai/ | ||
relatedIssues: | ||
- https://github.com/ipfs/specs/issues/444 | ||
order: 445 | ||
tags: ['ipips'] | ||
--- | ||
|
||
## Summary | ||
|
||
Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway]. | ||
|
||
## Motivation | ||
|
||
Allow clients to read a stream which only contain proofs in a bottom heavy | ||
graph using `raw` codec for it's leaves. | ||
|
||
Usefull for UnixFS for features like webseeds | ||
([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata | ||
about a DAG is fetched from a trustless gateway, but the actual raw data can be | ||
fetched from any source that supports either trustless gateway specification, | ||
or plain HTTP Range Requests, allowing for trustless and verifiable data | ||
retrieval from plain HTTP (non-IPFS) data sources. | ||
|
||
## Detailed design | ||
|
||
The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway] | ||
allows clients to download an entity except blocks with the multicodec | ||
`raw` (`0x55`). | ||
|
||
- When set to `y`, the parameter instructs the gateway not to transmit | ||
blocks referenced with a CID with the `raw` multicodec. | ||
- If set to `n`, or left unspecified, there is no special handling of `raw` | ||
multicodec blocks (the existing default behavior remains the same). | ||
|
||
Importantly, unless explicitly specified as `y`, the default operational | ||
mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`. | ||
|
||
## Design rationale | ||
|
||
### User Benefit | ||
|
||
Implementing the `skip-raw-blocks` parameter offers several benefits to users: | ||
|
||
1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received | ||
files in their deserialized form without necessitating the transmission of | ||
raw blocks from the gateway. | ||
|
||
2. **Incremental Download:** Clients can incrementally download files in | ||
deserialized forms from non-IPFS servers. Allowing applications to share | ||
distribution for IPFS and non-IPFS clients. | ||
|
||
3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled, | ||
clients can quickly discover numerous candidate blocks without being | ||
bottlenecked by the gateway's transmission of raw blocks. | ||
|
||
4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed | ||
over HTTP in deserialized form can now act as sources for specific block | ||
byte ranges, without having to support any IPFS specific APIs. Plain HTTP | ||
Range Requests can be used for fetching remaining raw block data, and the | ||
metadata read via `skip-raw-blocks=y` is enough for a client to verify the | ||
remaining raw block byte ranges fetched from non-IPFS system match expected | ||
CIDs. | ||
Comment on lines
+72
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to hint at that in 1 Verification Flexibility, this text goes in much more detail and I think should be merged with the first entry. |
||
|
||
### Compatibility | ||
|
||
Setting the default value of the `skip-raw-blocks` parameter to `n` ensures | ||
backward compatibility with existing clients and systems that are unaware | ||
of this new flag. | ||
|
||
### Alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ℹ️ updated this section with paths not taken, lmk if anything requires more elaboration. |
||
|
||
An alternative approach would be to request blocks individually. | ||
However, it adds extra round trips and more per HTTP request overhead | ||
and thus is undesirable. | ||
|
||
#### Why not `dag-scope=skip-raw-blocks` ? | ||
|
||
The existing `dag-scope` parameter determines the overall range of blocks to retrieve, | ||
while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges. | ||
Combining them under one parameter would restrict their combined utility. | ||
|
||
For example: | ||
- A client is streaming a video from a webseed and the user seeks through the | ||
video, then the client would send `dag-scope=entity&entity-bytes=42:1337` | ||
with `skip-raw-blocks=y` to download the proofs for the required section of the | ||
video, and then fetches remaining raw data byte ranges from a faster CDN. | ||
- A client is verifying an OOB transferred directory in deserialized form, | ||
then `dag-scope=all` with `skip-raw-blocks=y` makes sense. | ||
|
||
#### Why not CAR content type parameter ? | ||
|
||
CAR content type's | ||
([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) | ||
optional parameters like `order` and `dups` impact the way data is represented | ||
when returned as a CAR stream, but does modify the scope of the data itself. | ||
Does not add nor subtract data from the response. | ||
|
||
The scope of the data is controlled by URL content path and optional | ||
`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks` | ||
belongs. | ||
|
||
This is not just a matter of aesthetics: the URL path and query parameters | ||
allow for caching of different subsets of a DAG in a way that is interoperable | ||
with existing HTTP tools and clients, minimizes risk of caching incomplete DAG | ||
response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being | ||
in the URL query, we ensure CAR responses without `raw` blocks will be cached | ||
under different key than full responses (just like already existing `dag-scope` | ||
and `entity-bytes`). | ||
|
||
#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks? | ||
|
||
Prevention of amplification attacks and efficient server operation. | ||
|
||
By utilizing the `raw` (`0x55`) codec servers can trivially determine whether | ||
to fetch or skip a block without having to fetch it to learn any new | ||
information. | ||
|
||
If we framed this feature around skipping all leaf nodes, that would require | ||
server to fetch the leaves to learn if they have any child nodes. This would | ||
force server to fetch data that is never returned to the client. | ||
|
||
Although `skip-raw-blocks` is more limited and not able to handle UnixFS files | ||
chunked without `--raw-leaves` option, it allows both the client and server to | ||
trivially verify a block must not be fetched. Preventing issues of | ||
Amplification where a server could need to fetch multiple orders more data than | ||
the client when executing the request. | ||
|
||
## Security | ||
|
||
This IPIP does not impact security model of trustless gateway. | ||
|
||
## Test fixtures | ||
|
||
:::issue | ||
|
||
TODO: update below section with CIDs or CARs from conformance tests | ||
|
||
Scenarios we should check: | ||
- [ ] request for `/ipfs/cid` where CID has `raw` codec MUST return HTTP 400 (Bad Request) | ||
- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with | ||
`skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs | ||
- [ ] create a new CAR fixture that only have non-raw blocks. Request it with | ||
`skip-raw-blocks=y`, confirm the response includes expected CIDs and does not | ||
include raw blocks referenced by parents. | ||
- important part is creating CAR fixture by hand, and ensure the raw blocks are | ||
NEVER announced anywhere (generate fixture with random data, add to ipfs | ||
with raw-leaves option, then export DAG without `raw` blocks (use go-car's | ||
[`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or | ||
similar) | ||
- Why? This goes extra mile, but ensures every conformant gateway | ||
implementation is not doing useless work of fetching raw blocks which are | ||
not required for fulfilling `skip-raw-blocks=y` requests). We did | ||
similar thing for `entity-bytes` and it was the only way we could show | ||
bugs in Saturn project's cache implementation at the time. | ||
|
||
::: | ||
Comment on lines
+150
to
+172
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Jorropo this is the minimal set of tests I've identified for this IPIP, lmk if you think it is sufficient, or if we need more. The way we did this in the past, was to update test fixtures section at the very end:
|
||
|
||
### Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
Uh oh!
There was an error while loading. Please reload this page.