Skip to content

[SYCL] Optimize NDRDescT by removing sycl::range, sycl::id and padding #18851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: sycl
Choose a base branch
from

Conversation

DBDuncan
Copy link
Contributor

@DBDuncan DBDuncan commented Jun 6, 2025

sycl::range and sycl::id perform validity checks every time setting them. Use std::array instead as dimensions should already be valid. In addition, remove explicitly padding dimensions smaller than 3 and get number of dimensions from template argument instead of function argument.

sycl::range and sycl::id perform validity checks every time setting them. Use std::array instead as dimensions should already be valid. In addition, remove explicitly padding dimensions smaller than 3 and get number of dimensions from template argument instead of function argument.
@DBDuncan DBDuncan requested a review from a team as a code owner June 6, 2025 15:29
@DBDuncan DBDuncan requested a review from aelovikov-intel June 6, 2025 15:29
Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the throw from these SYCL classes instead?

@@ -3154,13 +3162,11 @@ _ZN4sycl3_V15queue20wait_and_throw_proxyERKNS0_6detail13code_locationE
_ZN4sycl3_V15queue22memcpyFromDeviceGlobalEPvPKvbmmRKSt6vectorINS0_5eventESaIS6_EE
_ZN4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_14SubmissionInfoERKNS2_13code_locationEb
_ZN4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_2v114SubmissionInfoERKNS2_13code_locationEb
_ZNK4sycl3_V15queue22submit_with_event_implERKNS0_6detail19type_erased_cgfo_tyERKNS2_2v114SubmissionInfoERKNS2_13code_locationEb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it wasn't you who messed up the sorting here, but please either remove unnecessary changes or clean it up with a preceding PR to just restore the sorting.

Comment on lines +73 to +82
NDRDescT(sycl::range<Dims_> N, bool SetNumWorkGroups) : Dims{size_t(Dims_)} {
if (SetNumWorkGroups) {
for (size_t I = 0; I < Dims_; ++I) {
NumWorkGroups[I] = N[I];
}
} else {
for (size_t I = 0; I < Dims_; ++I) {
GlobalSize[I] = N[I];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really weird to me. I know you didn't introduce this SetNumWorkGroups thing, but it's odd.

From a quick glance, it looks like:

  • We always store the range passed to the constructor, but potentially in different places.
  • NumWorkGroups is only used by hierarchical parallelism (parallel_for_work_group, specifically).

Could we flip the logic here, so that the constructor always unconditionally stores into GlobalSize, and the parallel_for_work_group code knows to read GlobalSize instead of NumWorkGroups?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants