[UR][CUDA] sycl-ls invalid device error/invalid pointer

### Describe the bug

With Intel oneAPI DPC++ compiler 2025.3.0 (or 2025.3.1) binary release + Unified Runtime v6.3.0-rc1 (built from source with the CUDA adapter), I get an invalid device error when I use `sycl-ls` on a system with Nvidia GPUs.

### To reproduce

1. Install oneAPI DPC++ compiler/toolkit 2025.3.1.
2. Source `setvars.sh` from above installed oneAPI toolkit.
3. Make sure you have an appropriate CUDA toolkit version (e.g. 12.3.0) in your environment.
4. Clone this repo and checkout tag v6.3.0-rc1.
5. `cd /path/to/llvm/unified-runtime`
6. I had to modify the `CMakeLists.txt` as follows in order to avoid a linking error and force position independent code:
   ```diff
   diff --git a/unified-runtime/CMakeLists.txt b/unified-runtime/CMakeLists.txt
   index 6e9d8bae95..d2b14bc12c 100644
   --- a/unified-runtime/CMakeLists.txt
   +++ b/unified-runtime/CMakeLists.txt
   @@ -15,9 +15,9 @@ endif()
    # Ubuntu's gcc uses --enable-default-pie. For the sake of a consistent build
    # across different gcc versions, set it globally for all targets
    # https://wiki.ubuntu.com/ToolChain/CompilerFlags#A-fPIE
   -if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
   +# if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
        set(CMAKE_POSITION_INDEPENDENT_CODE ON)
   -endif()
   +# endif()
    
    include(GNUInstallDirs)
    include(CheckCXXSourceCompiles)
   ```
7. Configure build      
   ```
   cmake -S . -B build -DUR_BUILD_TESTS=OFF -DUR_BUILD_ADAPTER_CUDA=ON -DCMAKE_BUILD_TYPE=RelWithDebugInfo -DCMAKE_INSTALL_PREFIX=/path/to/intel-unified-runtime-6.3.0-rc1
   ```
8. Build: 
   ```
   cmake --build build -j
   ```
9. Install:
   ```
   cmake --install build
   ```
10. Set search paths appropriately:
    ```
    export UR_ADAPTERS_SEARCH_PATH=/path/to/intel-unified-runtime-6.3.0-rc1/lib64
    ```
11. Run `sycl-ls --verbose`
12. Observe the following error
    ```
    <CUDA>[ERROR]:
    UR CUDA ERROR:
            Value:           101
            Name:            CUDA_ERROR_INVALID_DEVICE
            Description:     invalid device ordinal
            Function:        urDeviceGetInfo
            Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:949
    
    SYCL Exception encountered: cuda backend failed with error: 19 (UR_RESULT_ERROR_INVALID_DEVICE)
    
    free(): invalid pointer
    Aborted
    ```

#### Expected behaviour

Here's the output I get with DPC++ 2025.2.2 + UR v6.2.1 on the same system
<details>

<summary> Show output </summary>

```
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]

Platforms: 2
Platform [#1]:
    Version  : CUDA 12.8
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : 8.0
        Name              : NVIDIA A100-SXM4-80GB
        Vendor            : NVIDIA Corporation
        Driver            : CUDA 12.8
        UUID              : 43621878532191167133138125853824012515250
        DeviceID          : 0
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_UR_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.              
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_2d_usm ext_oneapi_bindless_images_gather ext_intel_current_clock_throttle_reasons<CUDA>[ERROR]:                       
UR NVML ERROR:
        Value:           3
        Description:     Not Supported
        Function:        urDeviceGetInfo
        Source Location: /path/to/llvm/unified-runtime/source/adapters/cuda/device.cpp:1149

 ext_intel_power_limits
        info::device::sub_group_sizes: 32
        Architecture: nvidia_gpu_sm_80
Platform [#2]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type              : cpu
        Version           : OpenCL 3.0 (Build 0)
        Name              : AMD EPYC 7763 64-Core Processor
        Vendor            : Intel(R) Corporation
        Driver            : 2025.20.8.0.06_160000
        UUID              : 34161715160011502501262552511392300
        DeviceID          : 10489617
        Num SubDevices    : 2
        Num SubSubDevices : 0
        Aspects           : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_srgb ext_oneapi_native_assert ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_device_id ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_atomic16 ext_oneapi_virtual_functions
        info::device::sub_group_sizes: 4 8 16 32 64
        Architecture: x86_64
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-80GB 8.0 [CUDA 12.8]
custom_selector(cpu)    : cpu, Intel(R) OpenCL, AMD EPYC 7763 64-Core Processor                 OpenCL 3.0 (Build 0) [2025.20.8.0.06_160000]
custom_selector(acc)    : No device of requested type available. Please chec...
```

</details>



### Environment

- OS: 
  * System 1: Red Hat Enterprise Linux 8.10
  * System 2: Rocky Linux 8.10
- Target device and vendor: Nvidia A100-SXM4-80GB (both systems have 4 of these per node)
- DPC++ version: 
  * System 1: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.0 (2025.3.0.20251010)
  * System 2: Intel(R) oneAPI DPC++/C++ Compiler 2025.3.1 (2025.3.1.20251023)
- Unified Runtime version: v6.3.0-rc1
- Dependencies version: 
  * System 1: Nvidia driver version 560.35.03 (CUDA version 12.6), CUDA toolkit version 12.3.0
  <details>
  <summary> Show full nvidia-smi output </summary>
  
  ```
  Fri Dec 19 15:53:58 2025                                                                   
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA A100-SXM4-80GB          On  |   00000000:03:00.0 Off |                    0 |
   | N/A   28C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   1  NVIDIA A100-SXM4-80GB          On  |   00000000:44:00.0 Off |                    0 |
   | N/A   28C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   2  NVIDIA A100-SXM4-80GB          On  |   00000000:84:00.0 Off |                    0 |
   | N/A   28C    P0             63W /  500W |       1MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
   |   3  NVIDIA A100-SXM4-80GB          On  |   00000000:C4:00.0 Off |                    0 |
   | N/A   28C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+
                                                                                              
   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |  No running processes found                                                             |
   +-----------------------------------------------------------------------------------------+
  ```
  
  </details>

  * System 2: Nvidia driver version 570.195.03 (CUDA version 12.8), CUDA toolkit version 12.8.1



### Additional context

The [release notes for oneAPI 2025.3](https://www.intel.com/content/www/us/en/developer/articles/release-notes/oneapi-dpcpp/2025.html#inpage-nav-3-5-1) mention using UR v6.2.0 but this gives me segfaults presumably because oneAPI 2025.3 uses UMF v1.0 and UR v6.2.0 uses UMF v0.11 hence why I switched to v6.3.0-rc1.

As explained above, oneAPI 2025.2.2 works fine with UR v6.2.1 (built in the same way).

I'm not very familiar with this codebase, so take my suggestion with a pinch of salt, but I wouldn't be surprised if #19287 is the culprit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[UR][CUDA] sycl-ls invalid device error/invalid pointer #20945

Describe the bug

To reproduce

Expected behaviour

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[UR][CUDA] sycl-ls invalid device error/invalid pointer #20945

Description

Describe the bug

To reproduce

Expected behaviour

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions