Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations

Previous thread: Fine Tuning Retail Object Detection Models provided in NGC - #6 by Morganh where we are attempting to fine tune DINO Retail Object detector with TAO 5.5

I am getting following error when trying to train the model in TAO 5.5.
Its looking for this configuration cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"] , but I cant find any such configuration in TAO DINO documentation

 tao model dino train \
-e  /workspace/tao-experiments/specs/train.yml
2024-11-22 03:25:19,278 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-11-22 03:25:19,368 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2024-11-22 03:25:19,382 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-11-22 03:25:27,199 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
/usr/local/lib/python3.10/dist-packages/hydra/plugins/config_source.py:124: UserWarning: Support for .yml files is deprecated. Use .yaml extension for Hydra config files
  deprecation_warning(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /workspace/tao-experiments/results/trainings/training1/status.json
  rank_zero_warn(
Seed set to 1234
Train results will be saved at: /workspace/tao-experiments/results/trainings/training1
Error executing job with overrides: []Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 69, in _func
    raise e
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 48, in _func
    runner(cfg, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 146, in main
    run_experiment(experiment_config=cfg,
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/dino/scripts/train.py", line 36, in run_experiment
    results_dir, resume_ckpt, gpus, ptl_loggers = initialize_train_experiment(experiment_config, key)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/initialize_experiments.py", line 56, in initialize_train_experiment
cudnn.benchmark = cfg["train"]["cudnn"]["benchmark"]omegaconf.errors.ConfigKeyError: Key 'cudnn' is not in struct
    full_key: train.cudnn
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending telemetry data.
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2024-11-22 03:25:35,916 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'train', 'network': 'dino', 'gpu': ['Tesla-V100-SXM2-16GB'], 'success': False, 'time_lapsed': 8} to https://api.tao.ngc.nvidia.com.
[2024-11-22 03:25:37,147 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2024-11-22 03:25:37,148 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2024-11-22 03:25:37,148 - TAO Toolkit - root - WARNING] Execution status: FAIL
2024-11-22 03:25:38,297 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

And following is the configuration file:

train:
  freeze: ['backbone', 'transformer.encoder']
  pretrained_model_path: /workspace/tao-experiments/models/retail_object_detection_vtrainable_retail_object_detection_binary_v2.2.2.3/dino_model_epoch011.pth
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  checkpoint_interval: 1
  seed: 1234
  results_dir: /workspace/tao-experiments/results/trainings/training1
  optim:
    lr_backbone: 1e-6
    lr: 1e-5
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
dataset:
  train_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/train
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_train.json
  val_data_sources:
    - image_dir: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/test
      json_file: /workspace/tao-experiments/data/dataset_2024-22-11T0942_1732228936/annotations/instances_test.json
  num_classes: 2
  batch_size: 4
  workers: 8
  augmentation:
    fixed_padding: False
model:
  backbone: fan_base
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048
results_dir: /workspace/tao-experiments/results/trainings/training1
encryption_key: nvidia_tao

Based on the pytorch repo, it seems its looking for other configurations such as cfg["train"]["cudnn"]["deterministic"], cfg["train"]["cudnn"]["benchmark"] which are not defined in documentation.

Can you please explain why I am getting this errors? (dont they have default values specified).
And if I am suppose to specify values, can you let me know the values for the above two configurations? Thanks.

4 posts - 2 participants

Read full topic

Fine Tuning DINO Retail Object detector - error out as it expects unspecified/unknown configurations

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112