Trying to get TAOkit training tasks to appear on ClearML. Getting a message that says the ClearML task could not be initialized, because it can’t access an S3 bucket.
From the tutorials, it seems like a ~/clearml.conf file isn’t needed. I’m not even sure where it was configured to dump into this s3 bucket.
• Hardware: A5000
• Network Type: Faster-RCNN
• TLT Version: 5.0.0-tf1.15.5
• Training spec file:
random_seed: 42
verbose: True
model_config: {
input_image_config: {
image_type: RGB,
image_channel_order: 'bgr',
size_height_width: {
height: 640,
width: 1024
},
image_channel_mean: {
key: 'b',
value: 103.939
},
image_channel_mean: {
key: 'g',
value: 116.779
},
image_channel_mean: {
key: 'r',
value: 123.68
},
image_scaling_factor: 1.0,
max_objects_num_per_image: 1000
},
arch: "resnet:50",
anchor_box_config: {
scale: 64.0,
scale: 128.0,
scale: 256.0,
ratio: 1.0,
ratio: 0.5,
ratio: 2.0
},
freeze_bn: True,
freeze_blocks: 0,
freeze_blocks: 1,
roi_mini_batch: 256,
rpn_stride: 16,
use_bias: False,
roi_pooling_config: {
pool_size: 7,
pool_size_2x: False
},
all_projections: True,
use_pooling: False
}
dataset_config: {
data_sources: {
tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/coco_trainval-fold-000-of-001-shard-0000[0-9]-of-00020",
image_directory_path: "/workspace/tao-experiments/train"
},
validation_data_source: {
tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/coco_trainval-fold-000-of-001-shard-0001[0-9]-of-00020",
image_directory_path: "/workspace/tao-experiments/train"
},
image_extension: 'jpg',
target_class_mapping: {
key: 'person',
value: 'person'
},
# validation_fold: 0
}
augmentation_config: {
preprocessing: {
output_image_width: 1024,
output_image_height: 640,
output_image_channel: 3,
min_bbox_width: 1.0,
min_bbox_height: 1.0,
enable_auto_resize: True
},
spatial_augmentation: {
hflip_probability: 0.5,
vflip_probability: 0.0,
zoom_min: 1.0,
zoom_max: 1.0,
translate_max_x: 0,
translate_max_y: 0
},
color_augmentation: {
hue_rotation_max: 0.0,
saturation_shift_max: 0.0,
contrast_scale_max: 0.0,
contrast_center: 0.5
}
}
training_config: {
enable_augmentation: True,
enable_qat: True,
batch_size_per_gpu: 4,
num_epochs: 15,
pretrained_weights: "/workspace/tao-experiments/faster_rcnn/resnet_50.hdf5",
# resume_from_model: ""/data/TAO_TOOLKIT/tao_poc/frcnn_training/model/resnet_50.hdf5",
rpn_min_overlap: 0.3,
rpn_max_overlap: 0.7,
classifier_min_overlap: 0.0,
classifier_max_overlap: 0.5,
gt_as_roi: False,
std_scaling: 1.0,
classifier_regr_std: {
key: 'x',
value: 10.0
},
classifier_regr_std: {
key: 'y',
value: 10.0
},
classifier_regr_std: {
key: 'w',
value: 5.0
},
classifier_regr_std: {
key: 'h',
value: 5.0
},
rpn_mini_batch: 256,
rpn_pre_nms_top_N: 12000,
rpn_nms_max_boxes: 2000,
rpn_nms_overlap_threshold: 0.7,
regularizer: {
type: L2,
weight: 1e-4
},
optimizer: {
sgd: {
lr: 0.02,
momentum: 0.9,
decay: 0.0,
nesterov: False
}
},
visualizer: {
enabled: true
clearml_config{
project: "training"
tags: "resnet50"
tags: "tao_toolkit"
tags: "unpruned"
task: "taokit_test"
}
},
learning_rate: {
soft_start: {
base_lr: 0.02,
start_lr: 0.002,
soft_start: 0.1,
annealing_points: 0.8,
annealing_points: 0.9,
annealing_divider: 10.0
}
},
lambda_rpn_regr: 1.0,
lambda_rpn_class: 1.0,
lambda_cls_regr: 1.0,
lambda_cls_class: 1.0
}
inference_config: {
images_dir: "/workspace/tao-experiments/test",
model: '/workspace/tao-experiments/faster_rcnn/frcnn_coco_resnet50.epoch_15.hdf5', # Update this with final model
batch_size: 1,
detection_image_output_dir: '/workspace/tao-experiments/faster_rcnn/inference_results_imgs',
labels_dump_dir: '/workspace/tao-experiments/faster_rcnn/inference_dump_labels',
rpn_pre_nms_top_N: 6000,
rpn_nms_max_boxes: 300,
rpn_nms_overlap_threshold: 0.7,
object_confidence_thres: 0.0001,
bbox_visualize_threshold: 0.6,
classifier_nms_max_boxes: 100,
classifier_nms_overlap_threshold: 0.3
}
evaluation_config: {
model: '/workspace/tao-experiments/faster_rcnn/frcnn_coco_resnet50.epoch_15.hdf5', # Update this with final model
batch_size: 1,
validation_period_during_training: 1,
rpn_pre_nms_top_N: 6000,
rpn_nms_max_boxes: 300,
rpn_nms_overlap_threshold: 0.7,
classifier_nms_max_boxes: 100,
classifier_nms_overlap_threshold: 0.3,
object_confidence_thres: 0.0001,
use_voc07_11point_metric: False,
gt_matching_iou_threshold: 0.5
}
• Command Line:
!tao model faster_rcnn train \
--gpus 4 \
--gpu_index 0 1 2 3 \
-e $SPECS_DIR/spec_resnet50.yaml \
-r /workspace/tao-experiments/faster_rcnn
• Logs (after this, training just proceeds as expected):
2023-12-06 12:38:45,522 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-12-06 12:38:45,614 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2023-12-06 12:38:46,825 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
Using TensorFlow backend.
2023-12-06 17:38:47.744034: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-12-06 17:38:47,784 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-12-06 17:38:48,760 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:48,786 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:48,789 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:49,761 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-n28otjum because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-12-06 17:38:49,979 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
ClearML Task: created new task id=a28908cfc49142b0be69c22d3fb63fac
2023-12-06 17:38:54,755 - clearml.Task - INFO - No repository found, storing script code instead
2023-12-06 17:38:54,787 - clearml.storage - ERROR - Failed creating storage object s3://training-clearml-artifacts Reason: Missing key and secret for S3 storage access (s3://training-clearml-artifacts)
WARNING: ClearML task init failed with error Could not get access credentials for 's3://training-clearml-artifacts' , check configuration file ~/clearml.conf
WARNING: Training will still continue.
............
Even after generating a ~/clearml.conf file and adding my access keys, I’m still getting this error. I’m unsure why.
Thanks for any help!
2 posts - 2 participants