Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

TAO dino trianing tensorboard image visualization not working

$
0
0

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type : dino
• Training spec file(If have, please share here)

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-05
    lr: 2e-4
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
  precision: fp16
  checkpoint_interval: 1
  activation_checkpoint: True
  pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
dataset:
  train_data_sources:
    - image_dir: /data/images/train/
      json_file: /data/train/annotations.json
  val_data_sources:
    - image_dir: /data/images/valid/
      json_file: /data/valid/annotations.json
  num_classes: 6
  batch_size: 8
  workers: 2
  augmentation:
    fixed_padding: True
model:
  backbone: fan_large
  train_backbone: False
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
After running the training , starting the tensorboard like below

tensorboard --logdir_spec=exp01:<result directory> --host 0.0.0.0 --port 8080

After running this i get the scalar graph in tensorboard , like loss and validation
but i couldn’t see any images with bounding boxes as it is passed to the model ,

I saw there is a setting we can add in spec file which is , as it is added to spec file

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-05
    lr: 2e-4
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
  precision: fp16
  checkpoint_interval: 1
  activation_checkpoint: True
  pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
  visualizer{
    enabled: true
  }
dataset:
  train_data_sources:
    - image_dir: /data/images/train/
      json_file: /data/train/annotations.json
  val_data_sources:
    - image_dir: /data/images/valid/
      json_file: /data/valid/annotations.json
  num_classes: 6
  batch_size: 8
  workers: 2
  augmentation:
    fixed_padding: True
model:
  backbone: fan_large
  train_backbone: False
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048


the `visualizer ’ config , but this option is not listed in the Dino training spec file doc and when i run with this configuration i get the error as well

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 105, in run
    cfg = self.compose_config(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 594, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 141, in load_configuration
    return self._load_configuration_impl(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 235, in _load_configuration_impl
    self._process_config_searchpath(config_name, parsed_overrides, caching_repo)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 158, in _process_config_searchpath
    loaded = repo.load_config(config_path=config_name)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 349, in load_config
    ret = self.delegate.load_config(config_path=config_path)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 92, in load_config
    ret = source.load_config(config_path=config_path)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/file_config_source.py", line 31, in load_config
    cfg = OmegaConf.load(f)
  File "/usr/local/lib/python3.10/dist-packages/omegaconf/omegaconf.py", line 192, in load
    obj = yaml.load(file_, Loader=get_yaml_loader())
  File "/usr/local/lib/python3.10/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 115, in check_token
    while self.need_more_tokens():
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 152, in need_more_tokens
    self.stale_possible_simple_keys()
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 291, in stale_possible_simple_keys
    raise ScannerError("while scanning a simple key", key.mark,
yaml.scanner.ScannerError: while scanning a simple key
  in "/specs/train.yaml", line 15, column 3
could not find expected ':'
  in "/specs/train.yaml", line 16, column 12
Execution status: FAIL
2024-07-08 06:23:50,828 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Can you please advice on how to get the image visualization in tensorboard

3 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles