Skip to content

Using Vector Loads #9

@Ali-Tehrani

Description

@Ali-Tehrani

CUDA actually has vector loads/stores, this reduces the number of instructions (3 to 1), which helps out with the memory latency issues we've been seeing in our functions (based on profiling).
Since NumPy defaults to row-major ordering, it makes our data handling a lot easier. I no longer have to convert everything to column-major format. This change also simplifies slicing grid points, which is a nice bonus for improving efficiency in our calculations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions