Error in TAO-toolkit classification

• Hardware (Tesla P40)
• Network Type (Classification)
• nvidai-tao version: 5.2.0.1

I am running a classification_tf2 example from v5.1.0 and my command is,

tao model classification_tf2 train -e path/to/spec/bind/mount

but i am getting this error,

2024-01-22 18:48:31,242 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-01-22 18:48:31,502 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf2.11.0
2024-01-22 18:48:33,158 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-01-22 13:18:35.096853: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error executing job with overrides: []
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 88, in _patched_hydra_run
    return PatchHydra._original_hydra_run(self, config_name, task_function, overrides, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 170, in _patched_task_function
    return task_function(a_config, *a_args, **a_kwargs)
  File "<frozen cv.classification.scripts.train>", line 215, in main
  File "<frozen common.utils>", line 62, in update_results_dir
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py", line 369, in __getitem__
    self._format_and_raise(
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/base.py", line 190, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/_utils.py", line 741, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py", line 367, in __getitem__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py", line 438, in _get_impl
    node = self._get_node(key=key, throw_on_missing_key=True)
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py", line 465, in _get_node
    self._validate_get(key)
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py", line 166, in _validate_get
    self._format_and_raise(
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/base.py", line 190, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/dist-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'results_dir' is not in struct
    full_key: train.results_dir
    object_type=dict

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_tf2/cv/classification/scripts/train.py>", line 3, in <module>
  File "<frozen cv.classification.scripts.train>", line 221, in <module>
  File "<frozen common.hydra.hydra_runner>", line 99, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 251, in run_and_report
    assert mdl is not None
AssertionError
Sending telemetry data.
Execution status: FAIL
2024-01-22 18:48:54,045 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

I have put the key results_dir in the spec file, here it is,

results_dir: '/workspace/'
encryption_key: 'nvidia_tlt'
dataset:
  train_dataset_path: "/workspace/tao-experiments/data/split/train"
  val_dataset_path: "/workspace/tao-experiments/data/split/val"
  preprocess_mode: 'torch'
  num_classes: 2
  augmentation:
    enable_color_augmentation: True
    enable_center_crop: True
train:
  qat: False
  checkpoint: ''
  batch_size_per_gpu: 64
  num_epochs: 5
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.05
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
model:
  backbone: 'byom'
  input_width: 227
  input_height: 227
  input_channels: 3
  input_image_depth: 8
  byom_model: '/workspace/tao-experiments/gender_net.tltb'
evaluate:
  dataset_path: "/workspace/tao-experiments/data/split/test"
  checkpoint: "/workspace/tao-experiments/class_net.tltb"
  top_k: 3
  batch_size: 256
  n_workers: 8
prune:
  checkpoint: '/workspace/tao-experiments/class_net.tltb'
  threshold: 0.68
  byom_model_path: '/workspace/tao-experiments/class_net.tltb'

Any idea what’s causing the issue?

9 posts - 2 participants

Read full topic

Error in TAO-toolkit classification_tf2 train

Trending Articles

Moondru Mudichu 29-01-2016 – Polimer tv Serial

AP Inter 1st / 2nd Year Hall Tickets Download 2019

Sarah Samis, Emil Bove III

Jailed rapist claimed the 10-year-old girls he abused 'wrecked his life'

99 God Status for Whatsapp, Facebook

16 Girls Get Pregnant After A Boy Ejaculated In A Swimming Pool

SOFT COPY ZA NGAIZA CHEMISTRY

Zimbabwe: Man Dies in Fight Over Stray Pig, Killer Fined U.S.$100 for...

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Three caught by police sting

3 Extremely pleasurable sex positions for slim women

DivineBitches--DiB-35420 Bella Rossi and Rob Yaeger Hi

Premier League 2023-2024 Font (OTF & Vector)

Panty fight

Practice Sheet of Right form of verbs for HSC Students

The Unlikely Best Friends Extra Questions Answer Class 6 English Poorvi

Responsable OptiMA - Tchad - H/F