-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Specification Version
SYCL 2020 (Revision 10)
Section Number(s)
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:interfaces.bundles
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_defining_kernels
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:naming.kernels
Issue Description
SYCL 2020 implementations that compile source code separately for the host and each device (and thus predefine the __SYCL_DEVICE_ONLY__ macro during device compilation and do not predefine the __SYCL_SINGLE_SOURCE__ macro) are required to correlate kernel name types used in calls to SYCL kernel invocation functions with kernel symbols (e.g., a kernel entry point function or the call operator member of the kernel function object type) emitted during device compilation. This implementation strategy produces multiple translation units for each primary source file; one for the host compilation and one for each device compilation.
Types with internal linkage are local to a translation unit and cannot, in general, be correlated across translation units. Implementations typically emit symbols that involve types with internal linkage with visibility settings that make them ineligible for reference by other translation units and therefore exempt from globally unique naming requirements. An allowance for kernel name types with internal linkage creates a significant implementation challenge since 1) such types may have names that are used in multiple translation units, but denote distinct types and thus distinct kernels, 2) the SYCL implementation might need to dynamically resolve kernel symbols (e.g., to select a separately compiled device image or to JIT generate a device image for a run-time selected device; there is a 1-N correspondence of a kernel name type known during host compilation to the kernels produced during device compilation), and 3) the SYCL implementation might not be able to correlate host and device translation units in order to form a "translation unit bundle" that restricts which device translation units need to be searched to resolve a host kernel name type (the implementation strategy used by DPC++ is to map the kernel name type to a string identifier in the (instantiated) SYCL kernel invocation function and then pass it to internal support functions present in a linked library to resolve it to a kernel symbol by searching all known device images).
Consider the following example:
k1.cpp:#include <sycl/sycl.hpp> namespace { struct KN; } void k1(sycl::handler &h) { sycl::stream sout(1024, 256, h); h.single_task<KN>([=]{ sout << "Hello from k1()\n"; }); }k2.cpp:#include <sycl/sycl.hpp> namespace { struct KN; } void k2(sycl::handler &h) { sycl::stream sout(1024, 256, h); h.single_task<KN>([=]{ sout << "Hello from k2()\n"; }); }main.cpp:#include <sycl/sycl.hpp> void k1(sycl::handler &h); void k2(sycl::handler &h); int main(int argc, const char **argv) { sycl::queue q; q.submit([](sycl::handler &h) { k1(h); }).wait(); q.submit([](sycl::handler &h) { k2(h); }).wait(); }
When compiled and run using DPC++, the following output is produced.
Hello from k1()
Hello from k1()
What happened is that, for both k1.cpp and k2.cpp, the (distinct) KN kernel name types were mapped to the same kernel symbol name (_ZTSN12_GLOBAL__N_12KNE; DPC++ uses the typeinfo mangled name from the Itanium ABI for the kernel name type to identify the kernel; since the kernel name type has internal linkage, the name is out of scope for the Itanium ABI and DPC++ nests it inside of a _GLOBAL__N_1 "module" namespace). At run-time, the kernel symbol name was resolved to the first device image found that contained that name; which happened to be the one associated with k1.cpp (DPC++ appears to search translation units in the order they were linked).
The issue demonstrated above is applicable to (unnamed) lambda kernels as well. If the example is changed such that k1() and k2() instead defer the kernel invocation to a static function with the same name in each translation unit and that invokes an unnamed lamdba (such that the lambda closure type is given the same (nested) name in each translation unit), DPC++ produces the same behavior.
There are several implementation strategies that can be deployed to support programs like the one above. For example:
- A kernel name type that has internal linkage can be associated with another symbol that has external linkage. For example, the use of
_GLOBAL__N_1as a "module" identifier in the kernel symbol name can be replaced with thek1()andk2()function names (which have external linkage) to produce distinct symbol names like_ZTSN2k12KNEand_ZTSN2k22KNEfor each translation unit (with some additional means to avoid conflicts with actual nested symbols of the same name). This works as long as the translation unit defines at least one symbol with external linkage or definesmain(). This strategy breaks down for translation units that do not define any symbols with external linkage (which is rare and generally implies a translation unit that defines functions with the GNUconstructorordestructorattributes or the WindowsDllMain()entry point function. - Assuming the primary source file has a name, a (portion of the file path and) the file name can be incorporated into the kernel symbol name to produce distinct symbol names like
_ZTSN6k1.cpp2KNEand_ZTSN6k2.cpp2KNE. This works as long as any translation units that use kernel name types with internal linkage have distinct names. This approach breaks down for translation units that are unnamed (perhaps piped through stdin) or where the primary source file is compiled multiple times with different macros defined (such that multiple definition errors are avoided). - A UUID (GUID) can be allocated for the set of host and device translation units produced by a (single) SYCL compilation and incorporated into the kernel symbol names. This approach breaks down if coordination of the UUID (GUID) is not possible; such as when implementations allow a device compilation to be performed at a time unrelated to the host compilation time.
The question to the SYCL WG are:
- whether the above program is (intended to be) well-formed (and the DPC++ behavior is non-conforming),
- whether the above program is (intended to be) ill-formed no diagnostic required or have undefined behavior (and the DPC++ behavior is conforming), and
- whether any updates, clarifications, or elaborations to the SYCL specification are warranted (for example, whether a defined symbol with external linkage or a definition of
main()is required to be present in each translation unit that uses a type with internal linkage as a implicit or explicit kernel name type).
Code Example (Optional)
No response