-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
CUDA actually has vector loads/stores, this reduces the number of instructions (3 to 1), which helps out with the memory latency issues we've been seeing in our functions (based on profiling).
Since NumPy defaults to row-major ordering, it makes our data handling a lot easier. I no longer have to convert everything to column-major format. This change also simplifies slicing grid points, which is a nice bonus for improving efficiency in our calculations.
Metadata
Metadata
Assignees
Labels
No labels