Skip to content

[RFC] feature: use reflinks for extent sharing between initramfs source and archive data #1141

Closed
@ddiss

Description

@ddiss

Proposal

I think we could speed up initramfs generation for some common (Btrfs / XFS)
setups by having dracut make heavier use of reflinks / COW clones during
initramfs generation. I'd guess >95% of an uncompressed+unstripped initramfs
image is duplicate data, which really shouldn't need to be shuffled
around when on the same COW clone capable FS.

Dracut already uses cp --reflink=auto when shuffling most things
into the /var/tmp staging area, so it should "just" be a matter of
making the cpio archive generation process clone-range aware
and dropping compression altogether.

This should allow for:

  • improved space efficiency
    • initramfs contents wouldn't be duplicated on disk
  • improved performance
    • initramfs image needn't be stripped / compressed / decompressed
    • initramfs generation would mostly perform metadata I/O
    • there may be some drawbacks due to fragmentation, but that would hopefully be compensated by the removal of compression / decompression

The following caveats would be present for dracut to successfully use reflink (otherwise fallback to read/write):

  • root, boot and dracut staging (/var/tmp) exist on the same Btrfs or XFS filesystem
  • paths don't have nocow flags set

Work-in-progress implementation

Luis and I made some changes to GNU cpio to perform between source and archive via the copy_file_range syscall. I've pushed this patchset to https://github.com/ddiss/cpio/tree/copy_file_range_2_13
Both XFS and Btrfs require proper alignment to ensure that copy_file_range actually results in extent sharing. To do this I worked on a Dracut padcpio binary which inserts dummy pad files into the initramfs cpio archive. The new binary, as well as Dracut logic to call cpio with the new parameters, can be found at https://github.com/ddiss/dracut/tree/cpio_cfr_align .

Needless to say both repos are WIP, so may result in data corruption or other disasters. At this stage I'm interested in some feedback on the approach. I've done some initial benchmarks atop btrfs, with positive results in terms of both runtime and space efficiency. I'll try to post some actual numbers in the coming days.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIssue adding new functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions