Skip to content

[UR][CUDA] sycl-ls invalid device error/invalid pointer #20945

@mirenradia

Description

@mirenradia

Describe the bug

With Intel oneAPI DPC++ compiler 2025.3.0 (or 2025.3.1) binary release + Unified Runtime v6.3.0-rc1 (built from source with the CUDA adapter), I get an invalid device error when I use sycl-ls on a system with Nvidia GPUs.

To reproduce

  1. Install oneAPI DPC++ compiler/toolkit 2025.3.1.
  2. Source setvars.sh from above installed oneAPI toolkit.
  3. Make sure you have an appropriate CUDA toolkit version (e.g. 12.3.0) in your environment.
  4. Clone this repo and checkout tag v6.3.0-rc1.
  5. cd /path/to/llvm/unified-runtime
  6. I had to modify the CMakeLists.txt as follows in order to avoid a linking error and force position independent code:
    diff --git a/unified-runtime/CMakeLists.txt b/unified-runtime/CMakeLists.txt
    index 6e9d8bae95..d2b14bc12c 100644
    --- a/unified-runtime/CMakeLists.txt
    +++ b/unified-runtime/CMakeLists.txt
    @@ -15,9 +15,9 @@ endif()
     # Ubuntu's gcc uses --enable-default-pie. For the sake of a consistent build
     # across different gcc versions, set it globally for all targets
     # https://wiki.ubuntu.com/ToolChain/CompilerFlags#A-fPIE
    -if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
    +# if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
         set(CMAKE_POSITION_INDEPENDENT_CODE ON)
    -endif()
    +# endif()
     
     include(GNUInstallDirs)
     include(CheckCXXSourceCompiles)
  7. Configure build
    cmake -S . -B build -DUR_BUILD_TESTS=OFF -DUR_BUILD_ADAPTER_CUDA=ON -DCMAKE_BUILD_TYPE=RelWithDebugInfo -DCMAKE_INSTALL_PREFIX=/path/to/intel-unified-runtime-6.3.0-rc1
    
  8. Build:
    cmake --build build -j
    
  9. Install:
    cmake --install build
    
  10. Set search paths appropriately:
    export UR_ADAPTERS_SEARCH_PATH=/path/to/intel-unified-runtime-6.3.0-rc1/lib64
    
  11. Run sycl-ls --verbose
  12. Observe the following error
    <CUDA>[ERROR]:
    UR CUDA ERROR:
            Value:           101
            Name:            CUDA_ERROR_INVALID_DEVICE
            Description:     invalid device ordinal
            Function:        urDeviceGetInfo
            Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:949
    
    SYCL Exception encountered: cuda backend failed with error: 19 (UR_RESULT_ERROR_INVALID_DEVICE)
    
    free(): invalid pointer
    Aborted
    

Expected behaviour

Here's the output I get with DPC++ 2025.2.2 + UR v6.2.1 on the same system

Show output
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]

Platforms: 2
Platform [#1]:
    Version  : CUDA 12.8
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : 8.0
        Name              : NVIDIA A100-SXM4-80GB
        Vendor            : NVIDIA Corporation
        Driver            : CUDA 12.8
        UUID              : 43621878532191167133138125853824012515250
        DeviceID          : 0
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_UR_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.              
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_2d_usm ext_oneapi_bindless_images_gather ext_intel_current_clock_throttle_reasons<CUDA>[ERROR]:                       
UR NVML ERROR:
        Value:           3
        Description:     Not Supported
        Function:        urDeviceGetInfo
        Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:1149

 ext_intel_power_limits
        info::device::sub_group_sizes: 32
        Architecture: nvidia_gpu_sm_80
Platform [#2]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type              : cpu
        Version           : OpenCL 3.0 (Build 0)
        Name              : AMD EPYC 7763 64-Core Processor
        Vendor            : Intel(R) Corporation
        Driver            : 2025.20.8.0.06_160000
        UUID              : 34161715160011502501262552511392300
        DeviceID          : 10489617
        Num SubDevices    : 2
        Num SubSubDevices : 0
        Aspects           : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_srgb ext_oneapi_native_assert ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_device_id ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_atomic16 ext_oneapi_virtual_functions
        info::device::sub_group_sizes: 4 8 16 32 64
        Architecture: x86_64
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(cpu)    : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
custom_selector(acc)    : No device of requested type available. Please chec...

Environment

  • OS:

    • System 1: Red Hat Enterprise Linux 8.10
    • System 2: Rocky Linux 8.10
  • Target device and vendor: Nvidia A100-SXM4-80GB (both systems have 4 of these per node)

  • DPC++ version:

    • System 1: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.0 (2025.3.0.20251010)
    • System 2: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.1 (2025.3.1.20251023)
  • Unified Runtime version: v6.3.0-rc1

  • Dependencies version:

    • System 1: Nvidia driver version 560.35.03 (CUDA version 12.6), CUDA toolkit version 12.3.0
    Show full nvidia-smi output
    Fri Dec 19 15:53:58 2025                                                                   
     +-----------------------------------------------------------------------------------------+
     | NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
     |-----------------------------------------+------------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
     |                                         |                        |               MIG M. |
     |=========================================+========================+======================|
     |   0  NVIDIA A100-SXM4-80GB          On  |   00000000:03:00.0 Off |                    0 |
     | N/A   28C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
     |   1  NVIDIA A100-SXM4-80GB          On  |   00000000:44:00.0 Off |                    0 |
     | N/A   28C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
     |   2  NVIDIA A100-SXM4-80GB          On  |   00000000:84:00.0 Off |                    0 |
     | N/A   28C    P0             63W /  500W |       1MiB /  81920MiB |      0%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
     |   3  NVIDIA A100-SXM4-80GB          On  |   00000000:C4:00.0 Off |                    0 |
     | N/A   28C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
                                                                                                
     +-----------------------------------------------------------------------------------------+
     | Processes:                                                                              |
     |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
     |        ID   ID                                                               Usage      |
     |=========================================================================================|
     |  No running processes found                                                             |
     +-----------------------------------------------------------------------------------------+
    
    • System 2: Nvidia driver version 570.195.03 (CUDA version 12.8), CUDA toolkit version 12.8.1

Additional context

The release notes for oneAPI 2025.3 mention using UR v6.2.0 but this gives me segfaults presumably because oneAPI 2025.3 uses UMF v1.0 and UR v6.2.0 uses UMF v0.11 hence why I switched to v6.3.0-rc1.

As explained above, oneAPI 2025.2.2 works fine with UR v6.2.1 (built in the same way).

I'm not very familiar with this codebase, so take my suggestion with a pinch of salt, but I wouldn't be surprised if #19287 is the culprit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions