Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

No CUDA-capable device is detected

$
0
0

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type :Yolo_v4
• TLT Version Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

• Training spec file(If have, please share here)
• How to reproduce the issue ?
Hi, to add some context, i’m editing the yolov4 notebook with custom datas from the tao_launcher_starter_kit. I’ve some trouble with cell before 2.3 : # If you use your own dataset, you will need to run the code below to generate the best anchor shape

!tao model yolo_v4 kmeans
-l $DATA_DOWNLOAD_DIR/kitti_split/training/label
-i $DATA_DOWNLOAD_DIR/kitti_split/training/image
-n 9
-x 960
-y 544 \

-e nvcr.io/nvidia/tao/tao-toolkit:v5.5.0

The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config

file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

I get this error :
2025-02-14 09:32:17,859 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-02-14 09:32:17,952 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-02-14 09:32:18,066 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 292:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/pcia2/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-02-14 09:32:18,066 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 305: Printing tty value True
Using TensorFlow backend.
2025-02-14 08:32:19.391648: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2025-02-14 08:32:19,497 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2025-02-14 08:32:20,541 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:20,573 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:20,580 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:21,874 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
2025-02-14 08:32:22,247 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.keras_exporter 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
File “/usr/local/bin/yolo_v4”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/entrypoint/yolo_v4.py”, line 12, in main
launch_job(nvidia_tao_tf1.cv.yolo_v4.scripts, “yolo_v4”, sys.argv[1:])
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py”, line 276, in launch_job
modules = get_modules(package)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py”, line 47, in get_modules
module = importlib.import_module(module_name)
File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 848, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 21, in
from nvidia_tao_tf1.cv.yolo_v4.export.yolov4_exporter import YOLOv4Exporter as Exporter
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py”, line 42, in
from nvidia_tao_tf1.cv.common.export.keras_exporter import KerasExporter as Exporter
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py”, line 46, in
from nvidia_tao_tf1.core.export.app import get_model_input_dtype
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/app.py”, line 40, in
from nvidia_tao_tf1.core.export._tensorrt import keras_to_tensorrt
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/_tensorrt.py”, line 39, in
import pycuda.autoinit # noqa pylint: disable=W0611
File “/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py”, line 5, in
cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2025-02-14 09:32:22,769 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 367: Stopping container.

nvidia-smi
NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: 12.2

I also tried to follow this topic since he seemed to have similar issues, however, i still have errors :
No CUDA-capable device is detected - yolov4 - #8 by ahaselhan

I tried :
> Could you open a terminal in the VM and run below ?
edited the link since i’m limited as a new user, look the post above to get it
> Then, run python .
*> *
> * *> #python* *> >>> import pycuda* *> >>> import pycuda.driver as cuda* *> >>> cuda.init()* *>

and :
Seems that no gpu is found.
Can you reboot it and retry?
More, can you try another docker?
edited the link since i’m limited as a new user, look the post above to get it

And i’m having the exact same result as him.

I would greatly appreciate any advice on how to proceed with this issue. I apologize in advance, but I’m quite new here, so what may seem simple or obvious to others might not be clear to me. If you need any additional information, please feel free to ask. I am currently running Ubuntu 22.04.5 LTS
,
Thanks,
Valentin

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles