Skip to content

[performance] beating git in git verify-pack βœ… πŸš€Β #1

Closed
@Byron

Description

@Byron

When trying to beat git verify-pack

Attempt 1

I remembered timings on a cold cache that indicated something around 5:50min for git to run a verify pack on the linux kernel pack. However, turns out that if the environment is a little more controlled, git is still considerably faster than us despite using an LRU cache and despite using multiple cores quite efficiently.

hard-to-beat-the-king

Observation

Git uses a streaming pack approach which is optimized to apply objects inversely. It works by

  • decompressing all deltas
  • applying all deltas that depend on a base, recursively (and thus avoiding to have to decompress deltas multiple times)

We work using a memory mapped file which is optimized for random access, but won't be very fast for this kind of workload.

How to fix

Wait until we have implemented a streaming pack as well and try again, having the same algorithmical benefits possibly faired with more efficient memory handling.
Git for some reason limits the application to 3 threads, even though we do benefit from having more threads so could be faster just because of this.
The streaming (indexing) phase of reading a pack can be parallelised in case we have a pack on disk, and it should be easy to implement if the index datastructure itself is threadsafe (but might not be worth the complexity or memory overhead, let's see).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions