致命错误：cuda_runtime_api.h：尝试在docker中使用cuda时没有此类文件或目录

# Dockerfile FROM nvidia/cuda:11.0-base COPY . /SingleModelTest WORKDIR /SingleModelTest RUN nvidia-smi RUN set -xe \ #these are just to make sure pip and git are installed to install the requirements && apt-get update \ && apt-get install python3-pip -y \ && apt-get install git -y RUN pip3 install --upgrade pip RUN pip3 install -r requirements/requirements1.txt RUN pip3 install -r requirements/requirements2.txt #this is where it fails ENTRYPOINT ["python"] CMD ["TabNetAPI.py"]

command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps cwd: /SingleModelTest/src/mmdet/ Complete output (24 lines): running develop running egg_info creating mmdet.egg-info writing mmdet.egg-info/PKG-INFO writing dependency_links to mmdet.egg-info/dependency_links.txt writing requirements to mmdet.egg-info/requires.txt writing top-level names to mmdet.egg-info/top_level.txt writing manifest file 'mmdet.egg-info/SOURCES.txt' reading manifest file 'mmdet.egg-info/SOURCES.txt' writing manifest file 'mmdet.egg-info/SOURCES.txt' running build_ext building 'mmdet.ops.utils.compiling_info' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/mmdet creating build/temp.linux-x86_64-3.8/mmdet/ops creating build/temp.linux-x86_64-3.8/mmdet/ops/utils creating build/temp.linux-x86_64-3.8/mmdet/ops/utils/src x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/usr/local/lib/python3.8/dist-packages/torch/include -I/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/dist-packages/torch/include/TH -I/usr/local/lib/python3.8/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c mmdet/ops/utils/src/compiling_info.cpp -o build/temp.linux-x86_64-3.8/mmdet/ops/utils/src/compiling_info.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=compiling_info -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 mmdet/ops/utils/src/compiling_info.cpp:3:10: fatal error: cuda_runtime_api.h: No such file or directory 3 | #include <cuda_runtime_api.h> | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

addict==2.3.0 albumentations==0.5.0 appdirs==1.4.4 asynctest==0.13.0 attrs==20.2.0 certifi==2020.6.20 chardet==3.0.4 cityscapesScripts==2.1.7 click==7.1.2 codecov==2.1.10 coloredlogs==14.0 coverage==5.3 cycler==0.10.0 Cython==0.29.21 decorator==4.4.2 flake8==3.8.4 Flask==1.1.2 humanfriendly==8.2 idna==2.10 imagecorruptions==1.1.0 imageio==2.9.0 imgaug==0.4.0 iniconfig==1.1.1 isort==5.6.4 itsdangerous==1.1.0 Jinja2==2.11.2 kiwisolver==1.2.0 kwarray==0.5.9 MarkupSafe==1.1.1 matplotlib==3.3.2 mccabe==0.6.1 mmcv==0.4.3 -e git+https://github.com/open-mmlab/mmdetection.git@0f33c08d8d46eba8165715a0995841a975badfd4#egg=mmdet networkx==2.5 opencv-python==4.4.0.44 opencv-python-headless==4.4.0.44 ordered-set==4.0.2 packaging==20.4 pandas==1.1.3 Pillow==6.2.2 pluggy==0.13.1 py==1.9.0 pycocotools==2.0.2 pycodestyle==2.6.0 pyflakes==2.2.0 pyparsing==2.4.7 pyquaternion==0.9.9 pytesseract==0.3.6 pytest==6.1.1 pytest-cov==2.10.1 pytest-runner==5.2 python-dateutil==2.8.1 pytz==2020.1 PyWavelets==1.1.1 PyYAML==5.3.1 requests==2.24.0 scikit-image==0.17.2 scipy==1.5.3 Shapely==1.7.1 six==1.15.0 terminaltables==3.1.0 tifffile==2020.9.3 toml==0.10.1 tqdm==4.50.2 typing==3.7.4.3 ubelt==0.9.2 urllib3==1.25.11 Werkzeug==1.0.1 xdoctest==0.15.0 yapf==0.30.0

2条回答

网友

1楼 · 编辑于 2024-09-29 21:34:10

多亏了@Robert Crovella，我解决了我的问题。原来我只需要使用nvidia/cuda/10.0-devel作为基本映像，而不是nvidia/cuda/10.0-base

因此，我的Dockerfile现在是：

# Dockerfile
FROM nvidia/cuda:10.0-devel

RUN nvidia-smi

RUN set -xe \
    && apt-get update \
    && apt-get install python3-pip -y \
    && apt-get install git -y 
RUN pip3 install  upgrade pip

WORKDIR /SingleModelTest

COPY requirements /SingleModelTest/requirements

RUN export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64

RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt


COPY . /SingleModelTest

ENTRYPOINT ["python"]

CMD ["TabNetAPI.py"]

网友

2楼 · 编辑于 2024-09-29 21:34:10

编辑：这个答案只是告诉你如何验证docker图像中发生了什么。不幸的是，我无法理解为什么会发生这种情况

如何检查

在docker构建的每个步骤中，您都可以看到生成的各个层。您可以使用该ID创建一个临时映像来检查正在发生的事情。e、 g

docker build -t my_bonk_example .
[...]
Removing intermediate container xxxxxxxxxxxxx
  -> 57778e7c9788
Step 19/31 : RUN mkdir -p /tmp/spark-events
  -> Running in afd21d853bcb
Removing intermediate container xxxxxxxxxxxxx
  -> 33b26e1a2286 <  let's use this ID
[ failure happens ]

docker run -it  rm  name bonk_container_before_failure 33b26e1a2286 bash
# now you're in the container

echo $LD_LIBRARY_PATH
ls /usr/local/cuda

关于Dockerfile的旁注：

如果更改Dockerfile中的指令顺序，可以缩短将来生成的生成时间。Docker使用一个缓存，当它发现与前一个构建不同的内容时，该缓存就会失效。我希望您更改代码的频率高于docker映像的要求，因此在apt指令之后移动副本是有意义的。e、 g

# Dockerfile
FROM nvidia/cuda:10.2-base

RUN set -xe \
    && apt-get update \
    && apt-get install python3-pip -y \
    && apt-get install git -y 

RUN pip3 install  upgrade pip

WORKDIR /SingleModelTest

COPY requirements /SingleModelTest/requirements

RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt

COPY . /SingleModelTest

RUN nvidia-smi

ENTRYPOINT ["python"]
CMD ["TabNetAPI.py"]

注意：这只是一个例子

关于为什么不构建映像，我发现PyTorch 1.4不支持CUDE 11.0（https://discuss.pytorch.org/t/pytorch-with-cuda-11-compatibility/89254），但是使用以前版本的CUDA也不能解决这个问题

相关问题更多 >

编程相关推荐

热门问题

热门文章