-
Notifications
You must be signed in to change notification settings - Fork 803
Description
Describe the bug
With Intel oneAPI DPC++ compiler 2025.3.0 (or 2025.3.1) binary release + Unified Runtime v6.3.0-rc1 (built from source with the CUDA adapter), I get an invalid device error when I use sycl-ls on a system with Nvidia GPUs.
To reproduce
- Install oneAPI DPC++ compiler/toolkit 2025.3.1.
- Source
setvars.shfrom above installed oneAPI toolkit. - Make sure you have an appropriate CUDA toolkit version (e.g. 12.3.0) in your environment.
- Clone this repo and checkout tag v6.3.0-rc1.
cd /path/to/llvm/unified-runtime- I had to modify the
CMakeLists.txtas follows in order to avoid a linking error and force position independent code:diff --git a/unified-runtime/CMakeLists.txt b/unified-runtime/CMakeLists.txt index 6e9d8bae95..d2b14bc12c 100644 --- a/unified-runtime/CMakeLists.txt +++ b/unified-runtime/CMakeLists.txt @@ -15,9 +15,9 @@ endif() # Ubuntu's gcc uses --enable-default-pie. For the sake of a consistent build # across different gcc versions, set it globally for all targets # https://wiki.ubuntu.com/ToolChain/CompilerFlags#A-fPIE -if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU") +# if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set(CMAKE_POSITION_INDEPENDENT_CODE ON) -endif() +# endif() include(GNUInstallDirs) include(CheckCXXSourceCompiles)
- Configure build
cmake -S . -B build -DUR_BUILD_TESTS=OFF -DUR_BUILD_ADAPTER_CUDA=ON -DCMAKE_BUILD_TYPE=RelWithDebugInfo -DCMAKE_INSTALL_PREFIX=/path/to/intel-unified-runtime-6.3.0-rc1 - Build:
cmake --build build -j - Install:
cmake --install build - Set search paths appropriately:
export UR_ADAPTERS_SEARCH_PATH=/path/to/intel-unified-runtime-6.3.0-rc1/lib64 - Run
sycl-ls --verbose - Observe the following error
<CUDA>[ERROR]: UR CUDA ERROR: Value: 101 Name: CUDA_ERROR_INVALID_DEVICE Description: invalid device ordinal Function: urDeviceGetInfo Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:949 SYCL Exception encountered: cuda backend failed with error: 19 (UR_RESULT_ERROR_INVALID_DEVICE) free(): invalid pointer Aborted
Expected behaviour
Here's the output I get with DPC++ 2025.2.2 + UR v6.2.1 on the same system
Show output
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
Platforms: 2
Platform [#1]:
Version : CUDA 12.8
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.0
Name : NVIDIA A100-SXM4-80GB
Vendor : NVIDIA Corporation
Driver : CUDA 12.8
UUID : 43621878532191167133138125853824012515250
DeviceID : 0
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_UR_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_2d_usm ext_oneapi_bindless_images_gather ext_intel_current_clock_throttle_reasons<CUDA>[ERROR]:
UR NVML ERROR:
Value: 3
Description: Not Supported
Function: urDeviceGetInfo
Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:1149
ext_intel_power_limits
info::device::sub_group_sizes: 32
Architecture: nvidia_gpu_sm_80
Platform [#2]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : AMD EPYC 7763 64-Core Processor
Vendor : Intel(R) Corporation
Driver : 2025.20.8.0.06_160000
UUID : 34161715160011502501262552511392300
DeviceID : 10489617
Num SubDevices : 2
Num SubSubDevices : 0
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_srgb ext_oneapi_native_assert ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_device_id ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_atomic16 ext_oneapi_virtual_functions
info::device::sub_group_sizes: 4 8 16 32 64
Architecture: x86_64
default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(cpu) : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
custom_selector(acc) : No device of requested type available. Please chec...
Environment
-
OS:
- System 1: Red Hat Enterprise Linux 8.10
- System 2: Rocky Linux 8.10
-
Target device and vendor: Nvidia A100-SXM4-80GB (both systems have 4 of these per node)
-
DPC++ version:
- System 1: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.0 (2025.3.0.20251010)
- System 2: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.1 (2025.3.1.20251023)
-
Unified Runtime version: v6.3.0-rc1
-
Dependencies version:
- System 1: Nvidia driver version 560.35.03 (CUDA version 12.6), CUDA toolkit version 12.3.0
Show full nvidia-smi output
Fri Dec 19 15:53:58 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-80GB On | 00000000:03:00.0 Off | 0 | | N/A 28C P0 60W / 500W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100-SXM4-80GB On | 00000000:44:00.0 Off | 0 | | N/A 28C P0 62W / 500W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA A100-SXM4-80GB On | 00000000:84:00.0 Off | 0 | | N/A 28C P0 63W / 500W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA A100-SXM4-80GB On | 00000000:C4:00.0 Off | 0 | | N/A 28C P0 62W / 500W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+- System 2: Nvidia driver version 570.195.03 (CUDA version 12.8), CUDA toolkit version 12.8.1
Additional context
The release notes for oneAPI 2025.3 mention using UR v6.2.0 but this gives me segfaults presumably because oneAPI 2025.3 uses UMF v1.0 and UR v6.2.0 uses UMF v0.11 hence why I switched to v6.3.0-rc1.
As explained above, oneAPI 2025.2.2 works fine with UR v6.2.1 (built in the same way).
I'm not very familiar with this codebase, so take my suggestion with a pinch of salt, but I wouldn't be surprised if #19287 is the culprit.