Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) A4000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) N/A
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) toolkit_version: 5.2.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
As the latest Tao doesn’t support rotation/shear online augmentation, I will have to do offline augmentation. There are two issues:
-
this 5.2 linked offline augmentation is using a deprecated example command “tao augment” : Offline Data Augmentation - NVIDIA Docs. It took me a while to figure out that
tao dataset augmentation generate
is the corrected one. Please update this -
even with
tao dataset augmentation generate
, if I dotao dataset augmentation generate --help
, the args are very different from what you have put here: Offline Data Augmentation - NVIDIA Docs. Again, this is very misleading -
after spending many hours, I figured a working command but cannot get the yaml parsed correctly.
- there is no example provided in Offline Data Augmentation - NVIDIA Docs
- the example provided in Offline Data Augmentation - NVIDIA Docs doesn’t work
Below is what I have tried but didn’t work:
spatial_aug:
rotation:
angle: 5
units: degrees
shear:
shear_ratio_x: 0.3
data:
dataset_type: coco
image_dir: /workspace/tao/data/images
anno_path: /workspace/tao/data/output.json
output_dataset: /workspace/tao/data/out
batch_size: 8
include_masks: false
It throws weird errors:
2024-01-04 02:35:24,610 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-01-04 02:35:24,691 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-data-services
2024-01-04 02:35:25,298 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
sys:1: UserWarning:
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
sys:1: UserWarning:
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning:
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
_run_hydra(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning:
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
_run_hydra(
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
full_key: spatial_aug.rotation.angle
reference_type=RotationConfig
object_type=RotationConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
full_key: spatial_aug.rotation.angle
reference_type=RotationConfig
object_type=RotationConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[740,1],0]
Exit code: 1
--------------------------------------------------------------------------
Sending telemetry data.
Execution status: FAIL
2024-01-04 02:35:31,674 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
5 posts - 2 participants