Please provide the following information when requesting support.
• Hardware (T4)
• Network Type : dino
• Training spec file(If have, please share here)
train:
num_gpus: 1
num_nodes: 1
validation_interval: 1
optim:
lr_backbone: 2e-05
lr: 2e-4
lr_steps: [11]
momentum: 0.9
num_epochs: 12
precision: fp16
checkpoint_interval: 1
activation_checkpoint: True
pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
dataset:
train_data_sources:
- image_dir: /data/images/train/
json_file: /data/train/annotations.json
val_data_sources:
- image_dir: /data/images/valid/
json_file: /data/valid/annotations.json
num_classes: 6
batch_size: 8
workers: 2
augmentation:
fixed_padding: True
model:
backbone: fan_large
train_backbone: False
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 900
num_select: 100
dropout_ratio: 0.0
dim_feedforward: 2048
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
After running the training , starting the tensorboard like below
tensorboard --logdir_spec=exp01:<result directory> --host 0.0.0.0 --port 8080
After running this i get the scalar graph in tensorboard , like loss and validation
but i couldn’t see any images with bounding boxes as it is passed to the model ,
I saw there is a setting we can add in spec file which is , as it is added to spec file
train:
num_gpus: 1
num_nodes: 1
validation_interval: 1
optim:
lr_backbone: 2e-05
lr: 2e-4
lr_steps: [11]
momentum: 0.9
num_epochs: 12
precision: fp16
checkpoint_interval: 1
activation_checkpoint: True
pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
visualizer{
enabled: true
}
dataset:
train_data_sources:
- image_dir: /data/images/train/
json_file: /data/train/annotations.json
val_data_sources:
- image_dir: /data/images/valid/
json_file: /data/valid/annotations.json
num_classes: 6
batch_size: 8
workers: 2
augmentation:
fixed_padding: True
model:
backbone: fan_large
train_backbone: False
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 900
num_select: 100
dropout_ratio: 0.0
dim_feedforward: 2048
the `visualizer ’ config , but this option is not listed in the Dino training spec file doc and when i run with this configuration i get the error as well
Error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 105, in run
cfg = self.compose_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 594, in compose_config
cfg = self.config_loader.load_configuration(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 141, in load_configuration
return self._load_configuration_impl(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 235, in _load_configuration_impl
self._process_config_searchpath(config_name, parsed_overrides, caching_repo)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 158, in _process_config_searchpath
loaded = repo.load_config(config_path=config_name)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 349, in load_config
ret = self.delegate.load_config(config_path=config_path)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 92, in load_config
ret = source.load_config(config_path=config_path)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/file_config_source.py", line 31, in load_config
cfg = OmegaConf.load(f)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/omegaconf.py", line 192, in load
obj = yaml.load(file_, Loader=get_yaml_loader())
File "/usr/local/lib/python3.10/dist-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 115, in check_token
while self.need_more_tokens():
File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 152, in need_more_tokens
self.stale_possible_simple_keys()
File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 291, in stale_possible_simple_keys
raise ScannerError("while scanning a simple key", key.mark,
yaml.scanner.ScannerError: while scanning a simple key
in "/specs/train.yaml", line 15, column 3
could not find expected ':'
in "/specs/train.yaml", line 16, column 12
Execution status: FAIL
2024-07-08 06:23:50,828 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
Can you please advice on how to get the image visualization in tensorboard
3 posts - 2 participants