Cannot run Dino with tao-5.3.0

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX 3080ti
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) DINO
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 5.3.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I am working to train Dino with my custom dataset, i follow the documentation from ngc and tao docs. After spend whole day, i still got several error like belows. Please help me to check it.

Specs

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-05
    lr: 2e-4
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
dataset:
  train_data_sources:
    - image_dir: /ws/tao_trainer/data/dino/train/images
      json_file: /ws/tao_trainer/data/dino/train/train.json
  val_data_sources:
    - image_dir: /ws/tao_trainer/data/dino/valid/images
      json_file: /ws/tao_trainer/data/dino/valid/valid.json
  num_classes: 6
  batch_size: 4
  workers: 8
  augmentation:
    fixed_padding: False
model:
  backbone: fan_small
  train_backbone: True
  pretrained_backbone_path: /ws/tao_trainer/dino/fan_small_hybrid_nvimagenet.pth
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048

Reproduce

docker run -it --rm --gpus all -v /home/tmp/Documents:/ws nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt dino train -e /ws/tao_trainer/dino/train.yml -r /ws/tao_trainer/dino/training_models -k threat_detection --gpus 1

===========================
=== TAO Toolkit PyTorch ===
===========================

NVIDIA Release 5.3.0-PyT (build 76438008)
TAO Toolkit Version 5.3.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 530.41.03 which has support for CUDA 12.1.  This container
  was built with CUDA 12.3 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

/usr/local/lib/python3.10/dist-packages/hydra/plugins/config_source.py:124: UserWarning: Support for .yml files is deprecated. Use .yaml extension for Hydra config files
  deprecation_warning(
Could not override 'results_dir'.
To append to your config use +results_dir=/ws/tao_trainer/dino/training_models
Key 'results_dir' is not in struct
    full_key: results_dir
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Execution status: FAIL

6 posts - 2 participants

Read full topic

Cannot run Dino with tao-5.3.0

Specs

Reproduce

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List