Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) Nvidia A40
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
toolkit_version: 5.3.0, nvcr.io/nvidia/tao/tao-toolkit 5.0.0-tf1.15.5
• Training spec file(If have, please share here)
detectnet_v2_train_resnet34_kitti.txt (3.5 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
am able to train resnet18, resnet34
but when i try resnet34_peoplenet.tlt the training is not getting started
tf records are created
i get below error it doesnt seem to be error but dont know why training is not started please help
!tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet34_kitti.txt -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned4 -k nvidia_tlt
Converting Tfrecords for kitti trainval dataset
2024-04-04 16:36:07,976 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-04-04 16:36:08,073 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-04-04 16:36:08,178 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-04-04 11:06:14.796200: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-04-04 11:06:15,578 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-04-04 11:06:23,096 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:23,304 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:23,340 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:32,578 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-19saayvo because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-04-04 11:06:33,304 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:35,548 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:35,590 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:35,595 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-04-04 11:06:36,210 [TAO Toolkit] [INFO] root 2102: Starting Object Detection Dataset Convert.
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.build_converter 87: Instantiating a kitti converter
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] root 2102: Instantiating a kitti converter
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 71: Creating output directory /workspace/tao-experiments/data/tfrecords/kitti_trainval
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] root 2102: Generating partitions
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.kitti_converter_lib 176: Num images in
Train: 14 Val: 6
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] root 2102: Num images in
Train: 14 Val: 6
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.kitti_converter_lib 197: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] root 2102: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 0
2024-04-04 11:06:36,211 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 0
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.
2024-04-04 11:06:36,212 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.
2024-04-04 11:06:36,353 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’person’: 14
2024-04-04 11:06:36,353 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 0
2024-04-04 11:06:36,353 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 0
2024-04-04 11:06:36,359 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’person’: 45
2024-04-04 11:06:36,359 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 89: Cumulative object statistics
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] root 2102: Cumulative object statistics
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’person’: 59
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 105: Class map.
Label in GT: Label in tfrecords file
b’Person’: b’person’
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] root 2102: Class map.
Label in GT: Label in tfrecords file
b’Person’: b’person’
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] root 2102: For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 114: Tfrecords generation complete.
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] root 2102: TFRecords generation complete.
2024-04-04 11:06:36,360 [TAO Toolkit] [INFO] root 2102: Dataset convert finished successfully.
Execution status: PASS
2024-04-04 16:36:43,357 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
2 posts - 2 participants