Skip to content

Option for array order independent hashing #51

@about-code

Description

@about-code

I am currently looking for object hashing strategies to quickly tell by comparison of the hashsum of two objects o1 and o2 whether they have identical contents. One important requirement is that the order of items of an array shouldn't matter since a particular order can't be guaranteed or isn't important. So when hashing o1and o2 where o1 and o2 are defined as

var o1 = ["foo", {"bar": ["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]
var o2 = ["foo", {"bar": [null, "baz", 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

then the hash for o1 and o2 should be equal even though the item oder of bar differs. The hashsum should remain to be distinct from o3 with

var o3 = [{"bar": ["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

which misses the item "foo".

Use cases I have in mind are change detection and caching strategies. E.g. clients with a REST-API could decide upon comparing a hashsum against an E-Tag HTTP-Header whether to update a client-side cache.

Array order independence could be achieved by sorting item hashes in the array prior to calculating the hashsum over the items (before updating the digester, respectively) - similar to how it is already implemented for hashObject(). By sorting the hashes of array items rather than sorting raw values we can sort lexicographically and gain reproducable hash sums independent of whether the raw item values are sortable.

I've implemented a demo and test cases for objecthash-js by @emschwartz (diff). It adds a new option to tweak the algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions