致命错误：cuda_runtime_api.h：尝试在docker中使用cuda时没有此类文件或目录问题的回答

致命错误：cuda_runtime_api.h：尝试在docker中使用cuda时没有此类文件或目录

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试为我想要部署的python脚本构建一个docker映像。这是我第一次使用docker，所以我可能做错了什么，但我不知道是什么 我的系统： <pre><code>OS: Ubuntu 20.04 docker version: 19.03.8 </code></pre> 我正在使用此Dockerfile： <pre><code># Dockerfile FROM nvidia/cuda:11.0-base COPY . /SingleModelTest WORKDIR /SingleModelTest RUN nvidia-smi RUN set -xe \ #these are just to make sure pip and git are installed to install the requirements && apt-get update \ && apt-get install python3-pip -y \ && apt-get install git -y RUN pip3 install --upgrade pip RUN pip3 install -r requirements/requirements1.txt RUN pip3 install -r requirements/requirements2.txt #this is where it fails ENTRYPOINT ["python"] CMD ["TabNetAPI.py"] </code></pre> nvidia smi的输出与预期一致： <pre><code>+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 1050 Off | 00000000:01:00.0 On | N/A | | 0% 54C P0 N/A / 90W | 1983MiB / 1995MiB | 18% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ </code></pre> 因此cuda确实可以工作，但当我尝试从需求文件安装所需的软件包时，会发生以下情况： <pre><code> command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps cwd: /SingleModelTest/src/mmdet/ Complete output (24 lines): running develop running egg_info creating mmdet.egg-info writing mmdet.egg-info/PKG-INFO writing dependency_links to mmdet.egg-info/dependency_links.txt writing requirements to mmdet.egg-info/requires.txt writing top-level names to mmdet.egg-info/top_level.txt writing manifest file 'mmdet.egg-info/SOURCES.txt' reading manifest file 'mmdet.egg-info/SOURCES.txt' writing manifest file 'mmdet.egg-info/SOURCES.txt' running build_ext building 'mmdet.ops.utils.compiling_info' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/mmdet creating build/temp.linux-x86_64-3.8/mmdet/ops creating build/temp.linux-x86_64-3.8/mmdet/ops/utils creating build/temp.linux-x86_64-3.8/mmdet/ops/utils/src x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/usr/local/lib/python3.8/dist-packages/torch/include -I/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/dist-packages/torch/include/TH -I/usr/local/lib/python3.8/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c mmdet/ops/utils/src/compiling_info.cpp -o build/temp.linux-x86_64-3.8/mmdet/ops/utils/src/compiling_info.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=compiling_info -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 mmdet/ops/utils/src/compiling_info.cpp:3:10: fatal error: cuda_runtime_api.h: No such file or directory 3 | #include <cuda_runtime_api.h> | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output. </code></pre> 失败的包是mmdetection。我使用2个独立的需求文件来确保在安装其他软件包之前先安装一些软件包，以防止依赖失败 requirements1.txt： <pre><code>torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html numpy==1.19.2 </code></pre> requirements2.txt： <pre><code>addict==2.3.0 albumentations==0.5.0 appdirs==1.4.4 asynctest==0.13.0 attrs==20.2.0 certifi==2020.6.20 chardet==3.0.4 cityscapesScripts==2.1.7 click==7.1.2 codecov==2.1.10 coloredlogs==14.0 coverage==5.3 cycler==0.10.0 Cython==0.29.21 decorator==4.4.2 flake8==3.8.4 Flask==1.1.2 humanfriendly==8.2 idna==2.10 imagecorruptions==1.1.0 imageio==2.9.0 imgaug==0.4.0 iniconfig==1.1.1 isort==5.6.4 itsdangerous==1.1.0 Jinja2==2.11.2 kiwisolver==1.2.0 kwarray==0.5.9 MarkupSafe==1.1.1 matplotlib==3.3.2 mccabe==0.6.1 mmcv==0.4.3 -e git+https://github.com/open-mmlab/mmdetection.git@0f33c08d8d46eba8165715a0995841a975badfd4#egg=mmdet networkx==2.5 opencv-python==4.4.0.44 opencv-python-headless==4.4.0.44 ordered-set==4.0.2 packaging==20.4 pandas==1.1.3 Pillow==6.2.2 pluggy==0.13.1 py==1.9.0 pycocotools==2.0.2 pycodestyle==2.6.0 pyflakes==2.2.0 pyparsing==2.4.7 pyquaternion==0.9.9 pytesseract==0.3.6 pytest==6.1.1 pytest-cov==2.10.1 pytest-runner==5.2 python-dateutil==2.8.1 pytz==2020.1 PyWavelets==1.1.1 PyYAML==5.3.1 requests==2.24.0 scikit-image==0.17.2 scipy==1.5.3 Shapely==1.7.1 six==1.15.0 terminaltables==3.1.0 tifffile==2020.9.3 toml==0.10.1 tqdm==4.50.2 typing==3.7.4.3 ubelt==0.9.2 urllib3==1.25.11 Werkzeug==1.0.1 xdoctest==0.15.0 yapf==0.30.0 </code></pre> 我用于（尝试）生成映像的命令： <code>nvidia-docker build -t firstdockertestsinglemodel:latest</code> 我尝试过的事情： <ul> <li>设置cuda环境变量，如cuda_HOME、LIBRARY_PATH、LD_LIBRARY_PATH，但我不确定是否正确，因为我无法检查我设置的路径，因为我无法在Ubuntu文件应用程序中看到它们</li> </ul> 我将非常感谢任何人能提供的任何帮助。如果我需要提供更多的信息，我很乐意

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

编辑：这个答案只是告诉你如何验证docker图像中发生了什么。不幸的是，我无法理解为什么会发生这种情况 如何检查 在docker构建的每个步骤中，您都可以看到生成的各个层。您可以使用该ID创建一个临时映像来检查正在发生的事情。e、 g <pre class="lang-sh prettyprint-override"><code>docker build -t my_bonk_example . [...] Removing intermediate container xxxxxxxxxxxxx -> 57778e7c9788 Step 19/31 : RUN mkdir -p /tmp/spark-events -> Running in afd21d853bcb Removing intermediate container xxxxxxxxxxxxx -> 33b26e1a2286 < let's use this ID [ failure happens ] docker run -it rm name bonk_container_before_failure 33b26e1a2286 bash # now you're in the container echo $LD_LIBRARY_PATH ls /usr/local/cuda </code></pre> <hr/> 关于Dockerfile的旁注： 如果更改Dockerfile中的指令顺序，可以缩短将来生成的生成时间。Docker使用一个缓存，当它发现与前一个构建不同的内容时，该缓存就会失效。我希望您更改代码的频率高于docker映像的要求，因此在apt指令之后移动副本是有意义的。e、 g <pre><code># Dockerfile FROM nvidia/cuda:10.2-base RUN set -xe \ && apt-get update \ && apt-get install python3-pip -y \ && apt-get install git -y RUN pip3 install upgrade pip WORKDIR /SingleModelTest COPY requirements /SingleModelTest/requirements RUN pip3 install -r requirements/requirements1.txt RUN pip3 install -r requirements/requirements2.txt COPY . /SingleModelTest RUN nvidia-smi ENTRYPOINT ["python"] CMD ["TabNetAPI.py"] </code></pre> 注意：这只是一个例子 <hr/> 关于为什么不构建映像，我发现PyTorch 1.4不支持CUDE 11.0（<a href="https://discuss.pytorch.org/t/pytorch-with-cuda-11-compatibility/89254" rel="nofollow noreferrer">https://discuss.pytorch.org/t/pytorch-with-cuda-11-compatibility/89254</a>），但是使用以前版本的CUDA也不能解决这个问题

致命错误：cuda_runtime_api.h：尝试在docker中使用cuda时没有此类文件或目录

1 个回答

相关Python问题