RuntimeError:cuda运行时错误(48):在mmdet/ops/roi\u a lign/src/roi\u align\u kernel的设备上没有可执行的内核映像。cu:139

2024-09-29 21:36:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我在谷歌计算引擎虚拟机上使用我的代码有点困难

我正在尝试运行一个小的FlaskAPI来检测图像中的表。 初始化检测器模型是可行的,但当我尝试检测表时,会出现以下错误:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "ElvyCascadeTabNetAPI.py", line 36, in detect_tables
    result = inference_detector(model, "temp.jpg")
  File "/SingleModelTest/src/mmdet/mmdet/apis/inference.py", line 86, in inference_detector
    result = model(return_loss=False, rescale=True, **data)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/models/detectors/base.py", line 149, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/models/detectors/base.py", line 130, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/models/detectors/cascade_rcnn.py", line 342, in simple_test
    x[:len(bbox_roi_extractor.featmap_strides)], rois)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/core/fp16/decorators.py", line 127, in new_func
    return old_func(*args, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/models/roi_extractors/single_level.py", line 105, in forward
    roi_feats_t = self.roi_layers[i](feats[i], rois_)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/SingleModelTest/src/mmdet/mmdet/ops/roi_align/roi_align.py", line 144, in forward
    self.sample_num, self.aligned)
  File "/SingleModelTest/src/mmdet/mmdet/ops/roi_align/roi_align.py", line 36, in forward
    spatial_scale, sample_num, output)
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at mmdet/ops/roi_a
lign/src/roi_align_kernel.cu:139

当我搜索可能的解决方案时,我遇到了两个stackoverflow问题,问题是一个不受支持的旧gpu,因此我将google计算引擎VM上的gpu从Nvidia Tesla K80更改为Nvidia Tesla T4。K80的cuda计算能力为3.7,而新的T4的计算能力为7.5,因此我认为这可以解决问题,但事实并非如此

输出nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P8    12W /  70W |    106MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       918      G   /usr/lib/xorg/Xorg                 95MiB |
|    0   N/A  N/A       974      G   /usr/bin/gnome-shell                9MiB |
+-----------------------------------------------------------------------------+

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

火炬版本:1.4.0+cu100 火炬视觉版0.5.0+cu100

我正在docker容器中运行API,Dockerfile:

# Dockerfile
FROM nvidia/cuda:10.0-devel

RUN nvidia-smi

RUN set -xe \
    && apt-get update \
    && apt-get install python3-pip -y \
    && apt-get install git -y \
    && apt-get install libgl1-mesa-glx -y
RUN pip3 install --upgrade pip

WORKDIR /SingleModelTest

COPY requirements /SingleModelTest/requirements

RUN export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64

RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt


COPY . /SingleModelTest

ENTRYPOINT ["python3"]

CMD ["TabNetAPI.py"]

编辑: 我被nvidia-smi的输出搞糊涂了,因为cuda版本比我安装的版本高,但根据:https://medium.com/@brianhourigan/if-different-cuda-versions-are-shown-by-nvcc-and-nvidia-smi-its-necessarily-not-a-problem-and-311eda26856c的说法,这是正常的

如果有人有解决办法,我将非常感激。 如果我需要提供更多的信息,我很乐意

先谢谢你


Tags: inpyselfsrclibpackagesusrlocal

热门问题