Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

NVIDIA TAOkit - ClearML task init failed, s3 access credentials

$
0
0

Trying to get TAOkit training tasks to appear on ClearML. Getting a message that says the ClearML task could not be initialized, because it can’t access an S3 bucket.

From the tutorials, it seems like a ~/clearml.conf file isn’t needed. I’m not even sure where it was configured to dump into this s3 bucket.

• Hardware: A5000
• Network Type: Faster-RCNN
• TLT Version: 5.0.0-tf1.15.5
• Training spec file:

random_seed: 42

verbose: True
model_config: {
  input_image_config: {
    image_type: RGB,
    image_channel_order: 'bgr',
    size_height_width: {
      height: 640,
      width: 1024
      },
    image_channel_mean: {
        key: 'b',
        value: 103.939
      },
    image_channel_mean: {
        key: 'g',
        value: 116.779
      },
    image_channel_mean: {
        key: 'r',
        value: 123.68
      },
    image_scaling_factor: 1.0,
    max_objects_num_per_image: 1000
    },
  arch: "resnet:50",
  anchor_box_config: {
    scale: 64.0,
    scale: 128.0,
    scale: 256.0,
    ratio: 1.0,
    ratio: 0.5,
    ratio: 2.0
    },
  freeze_bn: True,
  freeze_blocks: 0,
  freeze_blocks: 1,
  roi_mini_batch: 256,
  rpn_stride: 16,
  use_bias: False,
  roi_pooling_config: {
    pool_size: 7,
    pool_size_2x: False
    },
  all_projections: True,
  use_pooling: False
}

dataset_config: {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/coco_trainval-fold-000-of-001-shard-0000[0-9]-of-00020",
    image_directory_path: "/workspace/tao-experiments/train"
  },
  validation_data_source: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/coco_trainval-fold-000-of-001-shard-0001[0-9]-of-00020",
    image_directory_path: "/workspace/tao-experiments/train"
  },
  image_extension: 'jpg',
  target_class_mapping: {
    key: 'person',
    value: 'person'
    },
  # validation_fold: 0
}

augmentation_config: {
  preprocessing: {
    output_image_width: 1024,
    output_image_height: 640,
    output_image_channel: 3,
    min_bbox_width: 1.0,
    min_bbox_height: 1.0,
    enable_auto_resize: True
    },
  spatial_augmentation: {
    hflip_probability: 0.5,
    vflip_probability: 0.0,
    zoom_min: 1.0,
    zoom_max: 1.0,
    translate_max_x: 0,
    translate_max_y: 0
    },
  color_augmentation: {
    hue_rotation_max: 0.0,
    saturation_shift_max: 0.0,
    contrast_scale_max: 0.0,
    contrast_center: 0.5
    }
}

training_config: {
  enable_augmentation: True,
  enable_qat: True,
  batch_size_per_gpu: 4,
  num_epochs: 15,
  pretrained_weights: "/workspace/tao-experiments/faster_rcnn/resnet_50.hdf5",
  # resume_from_model: ""/data/TAO_TOOLKIT/tao_poc/frcnn_training/model/resnet_50.hdf5",
  rpn_min_overlap: 0.3,
  rpn_max_overlap: 0.7,
  classifier_min_overlap: 0.0,
  classifier_max_overlap: 0.5,
  gt_as_roi: False,
  std_scaling: 1.0,
  classifier_regr_std: {
    key: 'x',
    value: 10.0
    },
  classifier_regr_std: {
    key: 'y',
    value: 10.0
    },
  classifier_regr_std: {
    key: 'w',
    value: 5.0
    },
  classifier_regr_std: {
    key: 'h',
    value: 5.0
    },
  rpn_mini_batch: 256,
  rpn_pre_nms_top_N: 12000,
  rpn_nms_max_boxes: 2000,
  rpn_nms_overlap_threshold: 0.7,
  regularizer: {
    type: L2,
    weight: 1e-4
    },
  optimizer: {
    sgd: {
      lr: 0.02,
      momentum: 0.9,
      decay: 0.0,
      nesterov: False
      }
    },
  visualizer: {
    enabled: true
    clearml_config{
        project: "training"
        tags: "resnet50"
        tags: "tao_toolkit"
        tags: "unpruned"
        task: "taokit_test"
    }
  },
  learning_rate: {
    soft_start: {
      base_lr: 0.02,
      start_lr: 0.002,
      soft_start: 0.1,
      annealing_points: 0.8,
      annealing_points: 0.9,
      annealing_divider: 10.0
      }
    },
  lambda_rpn_regr: 1.0,
  lambda_rpn_class: 1.0,
  lambda_cls_regr: 1.0,
  lambda_cls_class: 1.0
}

inference_config: {
  images_dir: "/workspace/tao-experiments/test",
  model: '/workspace/tao-experiments/faster_rcnn/frcnn_coco_resnet50.epoch_15.hdf5', # Update this with final model
  batch_size: 1,
  detection_image_output_dir: '/workspace/tao-experiments/faster_rcnn/inference_results_imgs',
  labels_dump_dir: '/workspace/tao-experiments/faster_rcnn/inference_dump_labels',
  rpn_pre_nms_top_N: 6000,
  rpn_nms_max_boxes: 300,
  rpn_nms_overlap_threshold: 0.7,
  object_confidence_thres: 0.0001,
  bbox_visualize_threshold: 0.6,
  classifier_nms_max_boxes: 100,
  classifier_nms_overlap_threshold: 0.3
}

evaluation_config: {
  model: '/workspace/tao-experiments/faster_rcnn/frcnn_coco_resnet50.epoch_15.hdf5', # Update this with final model
  batch_size: 1,
  validation_period_during_training: 1,
  rpn_pre_nms_top_N: 6000,
  rpn_nms_max_boxes: 300,
  rpn_nms_overlap_threshold: 0.7,
  classifier_nms_max_boxes: 100,
  classifier_nms_overlap_threshold: 0.3,
  object_confidence_thres: 0.0001,
  use_voc07_11point_metric: False,
  gt_matching_iou_threshold: 0.5
}

• Command Line:

!tao model faster_rcnn train \
        --gpus 4 \
        --gpu_index 0 1 2 3 \
        -e $SPECS_DIR/spec_resnet50.yaml \
        -r /workspace/tao-experiments/faster_rcnn

• Logs (after this, training just proceeds as expected):

2023-12-06 12:38:45,522 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-12-06 12:38:45,614 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2023-12-06 12:38:46,825 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
Using TensorFlow backend.
2023-12-06 17:38:47.744034: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-12-06 17:38:47,784 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-12-06 17:38:48,760 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:48,786 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:48,789 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2023-12-06 17:38:49,761 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-n28otjum because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-12-06 17:38:49,979 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/spec_resnet50.yaml.
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
INFO: Log file already exists at /workspace/tao-experiments/faster_rcnn/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:87: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/scripts/train.py:96: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/faster_rcnn/utils/utils.py:419: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
INFO: Sampling mode of the dataloader was set to user_defined.
INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
INFO: total dataset size 1470, number of sources: 1, batch size per gpu: 4, steps: 368
ClearML Task: created new task id=a28908cfc49142b0be69c22d3fb63fac
2023-12-06 17:38:54,755 - clearml.Task - INFO - No repository found, storing script code instead
2023-12-06 17:38:54,787 - clearml.storage - ERROR - Failed creating storage object s3://training-clearml-artifacts Reason: Missing key and secret for S3 storage access (s3://training-clearml-artifacts)
WARNING: ClearML task init failed with error Could not get access credentials for 's3://training-clearml-artifacts' , check configuration file ~/clearml.conf
WARNING: Training will still continue.
............

Even after generating a ~/clearml.conf file and adding my access keys, I’m still getting this error. I’m unsure why.

Thanks for any help!

2 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles