Update GPU Compute Capacity support to match tensorflow

When trying to test stuff on GPU (on Linux) on 0.3.0-SNAPSHOT, it takes a while to initialize, before giving me:
```
2021-01-29 19:15:28.239332: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-29 19:15:28.239537: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
	at org.tensorflow.EagerSession.allocate(EagerSession.java:357)
	at org.tensorflow.EagerSession.<init>(EagerSession.java:327)
```

This is with a 1070 (compute 6.1) that was successfully recognized earlier:
```
2021-01-29 19:15:28.229419: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.797GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
```


After some diging, I found https://github.com/tensorflow/tensorflow/issues/41990, https://github.com/tensorflow/tensorflow/issues/41132#issuecomment-693543570, and https://github.com/tensorflow/tensorflow/issues/41892#issuecomment-667452483.

The last two in particular imply that the issue is that our binaries aren't being built with support for compute capacity 6.1, and sure enough, we don't: https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/build.sh#L25-L32

```bash
export TF_CUDA_COMPUTE_CAPABILITIES="3.5,7.0"
```

As per the 2nd and 3rd links, and https://www.tensorflow.org/install/gpu#hardware_requirements, the other tensorflow binaries (Python, C, C++, etc) support `3.5, 5.0, 6.0, 7.0, 7.5, 8.0 and higher than 8.0`.  Imo, we should do the same, ideally in a way we don't have to update when it changes (will simply not exporting it work?  The defaults are specified in https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L600).  This will likely increase build times though, which I think we already have issues with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update GPU Compute Capacity support to match tensorflow #200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update GPU Compute Capacity support to match tensorflow #200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions