-
Notifications
You must be signed in to change notification settings - Fork 140
aes: add VAES support #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
49a52de
to
a49e9e0
Compare
I had to change the cfg flag feature gating to be opt-in rather than opt-out in order for the CI tests to pass without bumping the MSRV. |
Maybe for cleaner history it's worth to revert #482, rebase this PR to new master, and merge it after v0.9.0 release? It would also allow us to publish a smaller v0.9.0. |
a49e9e0
to
1ba6590
Compare
One reason to consider keeping the original in history is because it serves as an explanation for why par block sizes of 30 for VAES256 and 64 for VAES512 were chosen, i.e., due to the register configuration. This detail is absent from the intrinsics implementation. |
This better be described in a code comment, than to be left for git history. |
True. I can add a comment about that. |
d9e12be
to
230c5f8
Compare
I added these comment about block sizes: #[cfg(all(target_arch = "x86_64", any(aes_avx256, aes_avx512)))]
impl<'a> ParBlocksSizeUser for $name_backend::Vaes256<'a> {
// Block size of 30 is chosen based on AVX2's 16 YMM registers.
//
// * 1 register holds 2 keys per round (loads interleaved with rounds)
// * 15 registers hold 2 data blocks
//
// This gives (16 <total> - 1 <round key>) * 2 <data> = 30 <data>.
type ParBlocksSize = U30;
}
#[cfg(all(target_arch = "x86_64", aes_avx512))]
impl<'a> ParBlocksSizeUser for $name_backend::Vaes512<'a> {
// Block size of 64 is chosen based on AVX512's 32 ZMM registers.
//
// * 11, 13, 15 registers for keys, correspond to AES-128, AES-192, AES-256
// * 11, 13, 15 registers hold 4 keys each (no interleaved loading like VAES256)
// * 16 registers hold 4 data blocks
// * 1-4 registers remain unused (could use them but probably not worth it)
//
// This gives (32 <total> - 15 <AES-256 round keys> - 1 <unused>) * 4 <data> = 64 <data>.
type ParBlocksSize = U64;
} |
@newpavlov what's the path forward here?
|
I would prefer to do the following:
|
Again, we can use |
I would prefer to have a simpler v0.9.0 release. Since v0.9.1 will be probably released shortly after v0.9.0, I think it's fine to postpone VAES support a bit. |
What is the drawback of an off-by-default feature which requires |
5k LoC is not a small amount. It also would require additional work to make it experimental. It will be much easier to revert the old PR and merge this PR with a proper intrinsics-based implementation and autodetection enabled by default. If we are to encounter any issues with the VAES backend we also will be able to quickly yank v0.9.1. |
I'm talking about shipping this PR, |
This reverts commit ad83428. This implementation uses assembly, but the relevant intrinsics will be stable in Rust 1.89, and we have a PR open to use them: #491 For a cleaner history, this reverts the assembly implementation so the intrinsics-based implementation can be cleanly applied to an ASM-free codebase, rather than as a replacement for the ASM.
This reverts commit ad83428. This implementation uses assembly, but the relevant intrinsics will be stable in Rust 1.89, and we have a PR open to use them: #491 For a cleaner history, this reverts the assembly implementation so the intrinsics-based implementation can be cleanly applied to an ASM-free codebase, rather than as a replacement for the ASM.
@silvanshade now that #496 has been merged, can you rebase/merge? |
230c5f8
to
22dd9e3
Compare
@silvanshade it looks like the CI config got lost in the rebase (also as of tomorrow you'll be able to use Rust 1.89) |
43acd15
to
2743559
Compare
2743559
to
e14fdff
Compare
e14fdff
to
d33373e
Compare
d33373e
to
f655326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapproving with the following notes:
- MSRV is untouched, since the
--cfg aes_avx256
or--cfg aes_avx512
options are required to enable the feature, as we requested - This should likewise have no impact on anyone who does not explicitly enable those
autodetect.rs
is improved/simplified despite a new backend, though the complexity has moved tox86.rs
. I still consider that a general win.- In a future release, we can potentially bump MSRV and enable this by default, but having a
cfg
at first allows an initial set of users to experiment with it and help us gather data if that's a good idea or not. I think at least for now this is a niche feature for users with Intel Xeon servers?
#[allow(unused)] // TODO: remove once cfg flags are removed | ||
pub(crate) features: Features, | ||
pub(crate) keys: &'a Simd128RoundKeys<$rounds>, | ||
pub(crate) simd_256_keys: OnceCell<Simd256RoundKeys<$rounds>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to cache broadcasted keys. It makes the struct bigger and it especially will be a problem when the backends will be enabled by default. I think we should just broadcast the keys during encryption/decryption and let the compiler to optimize it out when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changes to this behavior should be accompanied by benchmarks.
|
||
let backend = &$name_backend::Vaes256::from(backend); | ||
if backend.features.has_vaes256() { | ||
while rem >= backend.par_blocks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think VAES256 availability should be guaranteed if VAES512 available, no? If it's not guaranteed for some reason, we can be conservative and check VAES256 support in has_vaes512
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. VAES256 intrinsics are gated on the vaes
target feature, while VAES512 intrinsics are gated on vaes + avx512f
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one probably does make sense to change. I don't technically know if the ISA spec guarantees that VAES256 should be available if VAES512 is available (one would assume so), but especially if the compiler and LLVM believes that to be the case, we might as well follow that convention.
backend.encrypt_par_blocks(blocks); | ||
rem -= backend.par_blocks(); | ||
iptr = unsafe { iptr.add(backend.par_blocks()) }; | ||
optr = unsafe { optr.add(backend.par_blocks()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to write this code using safe code. See the InOutBuf::into_chunks
method.
Regarding the reviews from @newpavlov, I don't have time to work on this again and I'm not sure when I will. But even if I did, I also don't think it's reasonable to add a bunch of new requested changes at this point, especially given the long history of this code. I won't object if you want to make those changes yourself but I don't plan to do it. |
@silvanshade can you address this one in particular? #491 (comment) |
I can push fixes later into this branch (I also have a bunch of other nitpicky changes which I would like to add), but it probably will happen after v0.9.0 release. |
@newpavlov is there a particular reason you want to block merging this on "nitpicky changes"? Those sound like they could all be done in a completely incremental fashion at any point in time, since they aren't externally-facing. |
I did not mean that the merge is blocked on them, but that I plan to work on them together with fixing the comments above. At the very least before merging I would like to see removal of the I don't see why we should rush addition of VAES support into v0.9.0. I think it's fine to add it in v0.9.1 together with other new backends. |
I agree it would be good to get rid of |
I'd also say there appears to be a time-memory tradeoff here. We've had another request to support a lower memory profile at the cost of performance (#191) and I'm wondering if we should perhaps expose a That said, my expectation would be that performance is the top priority for anyone who goes out of their way to enable |
The broadcast should be extremely efficient cycle-wise, cache-friendly, and trivial for the compiler to move outside of the loop. |
This migrates the VAES implementation to the stabilized intrinsics for the upcoming
1.89
release.CC @tarcieri
Related:
Benchmarks:
The numbers are basically on par with #482
VAES512
VAES256
AES-NI