Skip to content

Commit e3d37a8

Browse files
authored
Enable TensorPrimitives to perform in-place operations (#92820)
Some operations would produce incorrect results if the same span was passed as both an input and an output. When vectorization was employed but the span's length wasn't a perfect multiple of a vector, we'd do the standard trick of performing one last operation on the last vector's worth of data; however, that relies on the operation being idempotent, and if a previous operation has overwritten input with a new value due to the same memory being used for input and output, some operations won't be idempotent. This fixes that by masking off the already processed elements. It adds tests to validate in-place use works, and it updates the docs to carve out this valid overlapping.
1 parent 56251ec commit e3d37a8

File tree

4 files changed

+740
-96
lines changed

4 files changed

+740
-96
lines changed

0 commit comments

Comments
 (0)