Hash changes if we change our metadata

Spawned from https://github.com/ipfs/go-ipfs/issues/8974.

**tl;dr** The CID is not the hash of your file, do not rely on it. The normal learning path can leave you with a wrong impression of an apparent stability between user data and CID/hash representing it.

Brief outline:

- New users are introduced in the IPFS world through the content-based paradigm: forget where you store it, all that counts is the data itself, which we identified through its hash. In contrast with location, your (user's) data doesn't change, neither will its hash.
- New users experiment with this paradigm by adding files to the IPFS system (CLI, HTTP, web, whatever) and get a CID/hash in return.
- There is now a discrepancy of what "data" means:
  - In the theory/docs the users visualize a block (string of bits) of _their_ data, what was contained in the FS file they're adding, nothing more.
  - In practice, through the UnixFS abstraction, the file is formatted in a DAG of many chunks (blocks) of the user's data. The DAG structure is supported by IPFS (not user) metadata, which is also part of the block of data that is being hashed and thus affects its CID.
- The metadata is leaked in the CID, whether the user cares about it or not. The same file added with different parameters (or even same parameters but new IPFS versions with different defaults) may be represented by different CIDs/hashes.

I think this happens to a lot of people (myself included). The simplest example of a "neutral" block of my data is what I first think of when immutability appears, and at some point we silently jump from that single block to a file without mentioning UnixFS, which is ugly and I get why is not in the foreground, but you normally translate that neutral/single/your block as your file, and therefore the immutability of data as also the immutability of its tag (CID). Not sure when but at some point we need to break it to you, maybe not even mentioning UnixFS but just the generic _metadata_, that we process your data and add some of our own to better organize and transmit it, and even if that is also immutable we may change our minds (_very_ rarely) as to what the best organization is. And you'll see a different hash reflecting it. Kind of sucks, but that's life, and it's still much better than httping all the time. (We can omit this last remark :grimacing:.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hash changes if we change our metadata #1152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hash changes if we change our metadata #1152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions