Description
Proposal
I think we could speed up initramfs generation for some common (Btrfs / XFS)
setups by having dracut make heavier use of reflinks / COW clones during
initramfs generation. I'd guess >95% of an uncompressed+unstripped initramfs
image is duplicate data, which really shouldn't need to be shuffled
around when on the same COW clone capable FS.
Dracut already uses cp --reflink=auto when shuffling most things
into the /var/tmp staging area, so it should "just" be a matter of
making the cpio archive generation process clone-range aware
and dropping compression altogether.
This should allow for:
- improved space efficiency
- initramfs contents wouldn't be duplicated on disk
- improved performance
- initramfs image needn't be stripped / compressed / decompressed
- initramfs generation would mostly perform metadata I/O
- there may be some drawbacks due to fragmentation, but that would hopefully be compensated by the removal of compression / decompression
The following caveats would be present for dracut to successfully use reflink (otherwise fallback to read/write):
- root, boot and dracut staging (/var/tmp) exist on the same Btrfs or XFS filesystem
- paths don't have nocow flags set
Work-in-progress implementation
Luis and I made some changes to GNU cpio to perform between source and archive via the copy_file_range
syscall. I've pushed this patchset to https://github.com/ddiss/cpio/tree/copy_file_range_2_13
Both XFS and Btrfs require proper alignment to ensure that copy_file_range
actually results in extent sharing. To do this I worked on a Dracut padcpio
binary which inserts dummy pad
files into the initramfs cpio archive. The new binary, as well as Dracut logic to call cpio with the new parameters, can be found at https://github.com/ddiss/dracut/tree/cpio_cfr_align .
Needless to say both repos are WIP, so may result in data corruption or other disasters. At this stage I'm interested in some feedback on the approach. I've done some initial benchmarks atop btrfs, with positive results in terms of both runtime and space efficiency. I'll try to post some actual numbers in the coming days.