-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Optimize NDRDescT by removing sycl::range, sycl::id and padding #18851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
sycl::range and sycl::id perform validity checks every time setting them. Use std::array instead as dimensions should already be valid. In addition, remove explicitly padding dimensions smaller than 3 and get number of dimensions from template argument instead of function argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the throw
from these SYCL classes instead?
@@ -3154,13 +3162,11 @@ _ZN4sycl3_V15queue20wait_and_throw_proxyERKNS0_6detail13code_locationE | |||
_ZN4sycl3_V15queue22memcpyFromDeviceGlobalEPvPKvbmmRKSt6vectorINS0_5eventESaIS6_EE | |||
_ZN4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_14SubmissionInfoERKNS2_13code_locationEb | |||
_ZN4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_2v114SubmissionInfoERKNS2_13code_locationEb | |||
_ZNK4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_2v114SubmissionInfoERKNS2_13code_locationEb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it wasn't you who messed up the sorting here, but please either remove unnecessary changes or clean it up with a preceding PR to just restore the sorting.
NDRDescT(sycl::range<Dims_> N, bool SetNumWorkGroups) : Dims{size_t(Dims_)} { | ||
if (SetNumWorkGroups) { | ||
for (size_t I = 0; I < Dims_; ++I) { | ||
NumWorkGroups[I] = N[I]; | ||
} | ||
} else { | ||
for (size_t I = 0; I < Dims_; ++I) { | ||
GlobalSize[I] = N[I]; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really weird to me. I know you didn't introduce this SetNumWorkGroups
thing, but it's odd.
From a quick glance, it looks like:
- We always store the
range
passed to the constructor, but potentially in different places. NumWorkGroups
is only used by hierarchical parallelism (parallel_for_work_group
, specifically).
Could we flip the logic here, so that the constructor always unconditionally stores into GlobalSize
, and the parallel_for_work_group
code knows to read GlobalSize
instead of NumWorkGroups
?
sycl::range and sycl::id perform validity checks every time setting them. Use std::array instead as dimensions should already be valid. In addition, remove explicitly padding dimensions smaller than 3 and get number of dimensions from template argument instead of function argument.