No CUDA-capable device is detected

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (Yolo_v4)
• TLT Version (TAO 5.0.0)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hi, some background info on my issue:

I am trying to run NVIDIA TAO version 5.0.0 to train a yolo v4 model. I am running a VM on google cloud, with a NVIDIA T4 GPU.

I followed the steps on this post: https://docs.nvidia.com/tao/tao-toolkit/text/running_in_cloud/running_tao_toolkit_on_gcp.html

I start running Jupyter from the terminal using this command:

andrewh@us-west4-t4:~$ jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token='password'

I get to step 2.3 and run the following command:

!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4/tfrecords/train \
                             -r $USER_EXPERIMENT_DIR/

And get the following output:

2024-08-12 18:37:14,420 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-08-12 18:37:14,513 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-08-12 18:37:14,560 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/andrewh/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-08-12 18:37:14,560 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Using TensorFlow backend.
2024-08-12 18:37:17.451564: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-08-12 18:37:17,786 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-08-12 18:37:21,340 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-08-12 18:37:21,470 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-08-12 18:37:21,489 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-08-12 18:37:25,637 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
2024-08-12 18:37:26,844 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.keras_exporter 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
  File "/usr/local/bin/yolo_v4", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/entrypoint/yolo_v4.py", line 12, in main
    launch_job(nvidia_tao_tf1.cv.yolo_v4.scripts, "yolo_v4", sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 276, in launch_job
    modules = get_modules(package)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 47, in get_modules
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py", line 21, in <module>
    from nvidia_tao_tf1.cv.yolo_v4.export.yolov4_exporter import YOLOv4Exporter as Exporter
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py", line 42, in <module>
    from nvidia_tao_tf1.cv.common.export.keras_exporter import KerasExporter as Exporter
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py", line 46, in <module>
    from nvidia_tao_tf1.core.export.app import get_model_input_dtype
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/app.py", line 40, in <module>
    from nvidia_tao_tf1.core.export._tensorrt import keras_to_tensorrt
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/_tensorrt.py", line 39, in <module>
    import pycuda.autoinit  # noqa pylint: disable=W0611
  File "/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py", line 5, in <module>
    cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2024-08-12 18:37:28,159 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

nvidia-smi

Mon Aug 12 20:15:05 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02   Driver Version: 470.256.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P0    28W /  70W |    514MiB / 15109MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1070      G   /usr/lib/xorg/Xorg                 67MiB |
|    0   N/A  N/A      1926      G   /usr/lib/xorg/Xorg                131MiB |
|    0   N/A  N/A      2053      G   /usr/bin/gnome-shell               27MiB |
|    0   N/A  N/A      2456      C   /usr/NX/bin/nxnode.bin            132MiB |
|    0   N/A  N/A      4758      G   /usr/lib/firefox/firefox          141MiB |
+-----------------------------------------------------------------------------+

dpkg -l | grep cuda

ii  libcudart10.1:amd64                        10.1.243-3                           amd64        NVIDIA CUDA Runtime Library
ii  nvidia-cuda-dev                            10.1.243-3                           amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                            10.1.243-3                           all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                            10.1.243-3                           amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                        10.1.243-3                           amd64        NVIDIA CUDA development toolkit

I’ve read the forum post here with a similar issue: No CUDA-capable device is detected on tao detectnet_v2 dataset convert - #4 by NilsAI

But I am unsure if it applies since I think I am running TAO in a different method than the author of the post.

Any advice on how to proceed with this issue would be much appreciated. I apologize in advice, but I am very new to using Linux, so somethings that may be obvious or simple may not be for me. If any more info is needed, please let me know. I am running Ubuntu 20.04.06, 64-bit.

Thanks,
Andrew

4 posts - 2 participants

Read full topic

No CUDA-capable device is detected - yolov4

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112