Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 537

EfficientDet T2 Evaluation gives 0 accuracy

$
0
0

Please provide the following information when requesting support.

• Hardware: NVIDIA A10G
• Network Type: EfficientDet-d0
• TLT Version: 5.5.0-tf2

I have successfully trained EfficientDet on my custom dataset using

docker run -d  --rm --gpus all -v /mnt/rod_efs/:/workspace/tao-experiments nvcr.io/nvidia/tao/tao-toolkit:5.5.0-tf2 efficientdet_tf2 train -e /workspace/tao-experiments/tao/specs/train.yaml results_dir=/workspace/tao-experiments/tao/results/training num_gpus=4

The accuracy I’ve got on the last epoch=200 is the following

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.720
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.327
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.324
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.405
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.442
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.538
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.350
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.495
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.581

However, when I run the below command to evaluate the model, I am getting 0 accuracy in all metrics.

tao model efficientdet_tf2 evaluate -e /workspace/tao-experiments/tao/specs/train.yaml results_dir=/workspace/tao-experiments/tao/results/training evaluate.checkpoint=/workspace/tao-experiments/tao/results/training/train/efficientdet-d0_200.tlt

025-04-04 05:38:36,139 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-04-04 05:38:36,213 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-tf2
2025-04-04 05:38:36,240 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2025-04-04 05:38:37.417230: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9373] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-04 05:38:37.417299: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-04 05:38:37.418948: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1534] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-04 05:38:37.426734: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[2025-04-04 05:38:40,820 - TAO Toolkit - matplotlib - WARNING] Matplotlib created a temporary cache directory at /tmp/matplotlib-ierd380j because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
[2025-04-04 05:38:41,010 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
/usr/local/lib/python3.10/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.10/dist-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.12.0 and strictly below 2.15.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.15.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(
sys:1: UserWarning:
'train38.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen common.hydra.hydra_runner>:-1: UserWarning:
'train38.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Evaluate results will be saved at: /workspace/tao-experiments/tao/results/retail_object_detection/training38_2/evaluate
Starting efficientdet evaluation.
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x77887d832830> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x77887d832830>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x77887d832830> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x77887d832830>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d8328c0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d8328c0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d8328c0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d8328c0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d833400> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d833400>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d833400> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x77887d833400>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x77887d367730>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer objectat 0x77887d367730>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x77887d367730>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer objectat 0x77887d367730>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/dispatch.py:1260: resize_nearest_neighbor (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.image.resize(...method=ResizeMethod.NEAREST_NEIGHBOR...)` instead.
From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/dispatch.py:1260: resize_nearest_neighbor (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.image.resize(...method=ResizeMethod.NEAREST_NEIGHBOR...)` instead.
WARNING:tensorflow:AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x77887d8d8fa0>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x77887d8d8fa0>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Originalerror: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x77887d8d8fa0>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x77887d8d8fa0>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Originalerror: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function run_experiment.<locals>.eval_model_fn at 0x77887bb2c670> and will run it as-is.
Cause: Unable to locate the source code of <function run_experiment.<locals>.eval_model_fn at 0x77887bb2c670>. Note that functions defined in certain environments,like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function run_experiment.<locals>.eval_model_fn at 0x77887bb2c670> and will run it as-is.
Cause: Unable to locate the source code of <function run_experiment.<locals>.eval_model_fn at 0x77887bb2c670>. Note that functions defined in certain environments,like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
use max_nms_inputs for pre-nms topk.

15/16 [===========================>..] - ETA: 0sloading annotations into memory...Done (t=0.02s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(12800, 7)
0/12800
DONE (t=0.03s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.27s).
Accumulating evaluation results...
DONE (t=0.07s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.003
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.004
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007
Evaluation finished successfully.

The path to the test tfrecords and annotations is the same in both case because I am using the same specs file. I have seen some other topics talking about the similar issue but did not find a solution there. I have tried to evaluate different epochs as well.

Any ideas, what is the difference of implementation within the evaluate function?

My specs file

dataset:
  loader:
    prefetch_size: 4
    shuffle_file: False
    shuffle_buffer: 10000
    cycle_length: 32
    block_length: 16
  max_instances_per_image: 100
  skip_crowd_during_training: True
  use_fake_data: False
  num_classes: 2
  train_tfrecords:
    -  "/workspace/tao-experiments/tao/dataset/dataset_2025-26-03T1647_1742968074/tfrecords/train"
  val_tfrecords:
    -  "/workspace/tao-experiments/tao/dataset/dataset_2025-26-03T1647_1742968074/tfrecords/test"
  val_json_file: "/workspace/tao-experiments/tao/datasetdataset_2025-26-03T1647_1742968074/coco/annotations/instances_test.json"
  augmentation:
    rand_hflip: True
    random_crop_min_scale: 0.1
    random_crop_max_scale: 2
    auto_color_distortion: False
    auto_translate_xy: False
train:
  optimizer:
    name: 'sgd'
    momentum: 0.9
  lr_schedule:
    name: 'cosine'
    warmup_epoch: 1
    warmup_init: 0.0001
    learning_rate: 0.2
    annealing_epoch: 10
  amp: False
  num_examples_per_epoch: 106
  checkpoint: "/workspace/tao-experiments/tao/models/pre-trained_object_detection/pretrained_efficientdet_tf2_efficientnet_b0/"
  #checkpoint: "/workspace/tao-experiments/tao/models/retail_object_detection/retail_object_detection_trainable_binary_v1.1/" #pre-trained retail
  #checkpoint: "/workspace/tao-experiments/tao/models/pre-trained_object_detection/pretrained_efficientdet_vefficientnet_b0/efficientnet_b0.hdf5"
  moving_average_decay: 0.999
  batch_size: 8
  checkpoint_interval: 10
  l2_weight_decay: 0.00004
  l1_weight_decay: 0.0
  clip_gradients_norm: 10.0
  image_preview: False
  qat: False
  random_seed: 42
  pruned_model_path: ''
  num_epochs: 200
  label_smoothing: 0.0
  box_loss_weight: 50.0
  iou_loss_type: 'giou'
  iou_loss_weight: 1.0

model:
  name: 'efficientdet-d0'
  aspect_ratios: '[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]'
  anchor_scale: 4
  min_level: 3
  max_level: 7
  num_scales: 3
  freeze_bn: False
  freeze_blocks: []
  input_width: 800
  input_height: 608
evaluate:
  batch_size: 8
  num_samples: 128
  max_detections_per_image: 100
  checkpoint: ''
encryption_key: 'nvidia_tlt'

Thanks

2 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 537

Trending Articles