找不到TensorFlow libdevice。为什么在搜索的路径中找不到它?

2024-09-30 20:25:28 发布

您现在位置:Python中文网/ 问答频道 /正文

赢10 64位21H1;TF2.5,CUDA 11安装在环境中(Python 3.9.5 Xeus)

我不是唯一一个看到这个错误的人;另见(未答复的)herehere。 这个问题模糊不清,拟议的决议不清楚/似乎不起作用(见例here

问题使用TF Linear_Mixed_Effects_Models.ipynb示例(从TensorFlow github here下载)执行达到执行“预热阶段”的点,然后抛出错误:

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

控制台包含输出,显示它找到GPU,但XLA初始化无法找到现有的!指定路径中的libdevice

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

现在有趣的是,搜索的路径包括“C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin”

该文件夹的内容包括所有(在TF启动时成功加载)dll,包括cudart64_110.dll、dudnn64_8.dll。。。当然还有libdevice.10.bc

问题既然TF说它正在搜索此位置以查找此文件,并且该文件存在于此位置,那么问题出在哪里,我该如何修复它

(注意C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2不存在…CUDA安装在环境中;此路径必须是OS安装的最佳猜测)

信息:我正在设置路径

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

但我也将OS环境变量XLA_FLAGS设置为相同的字符串值。。。我还不知道哪一个在实际工作,但是控制台输出说它搜索了预期的路径这一事实已经足够好了


Tags: the路径herecompilergpudevicetensorflowservice
1条回答
网友
1楼 · 发布于 2024-09-30 20:25:28

诊断信息不清楚,因此没有帮助;然而,有一项决议

通过在此路径提供文件(作为副本)解决了此问题

C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\

请注意,C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin是给定给XLA_标志的路径,但它似乎不是在寻找libdevice文件,而是在寻找\nvm\libdevice\路径,这意味着我不能在XLA_标志中设置一个不同的值来指向libdevice文件的实际位置,因为用一句话来说,它不是(仅仅是)它正在查找的文件

先前的调试信息:

2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .

不正确,因为搜索路径中没有“CUDA”;FWIW我认为在C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2中搜索时应该给出一个不同的错误,因为没有这样的文件夹(那里有一个旧的V10.0文件夹,但没有CUDA 11的OS安装)

除非TensorFlow改进了路径处理,否则在每个新的(Anaconda)python环境中都需要这样的文件结构操作

TensorFlow论坛中的完整线程here

相关问题 更多 >