Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

$
0
0

Setup information
• Hardware Platform (Jetson / GPU) : Jetson Orin Nano
• DeepStream Version : DeepStream 6.3
• JetPack Version (valid for Jetson only) : Jetpack 5.1.3
• TensorRT Version : TensorRT 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) : N/A
• Issue Type( questions, new requirements, bugs) : Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) : See below for configurations.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) : N/A


Issue:
Fine-tuned TAO ClassificationTF2 TLT model (EfficientNet-B0 backbone) gives high inference accuracy, but accuracy dropped after converting the model to TensorRT engine and running inference in DeepStream as SGIE.

Evaluation Method:
Results were compared on the same video.
This is how we compare the TLT and TensorRT models:

  1. We used the same PGIE (PeopleNet) and tracker to perform detection and tracking.
  2. We cropped the objects from the video frames based on the bounding boxes in the KITTI tracker output file.
  3. We run tao model classification_tf2 inference and evaluate the results of the TLT model.
  4. We run inference on the same video in DeepStream and TRTEXEC, and manually compare the results.

Accuracy Comparison:

Object Ground Truth Class TLT Accuracy TensorRT Accuracy
Object_1 Class_1 96.01% 41.46%
Object_2 Class_1 97.60% 9.36%
Object_3 Class_1 100% 18.00%
Object_4 Class_2 100% 100%
Object_5 Class_2 100% 100%

Notes:

  • The TensorRT accuracy was obtained by running TRT inference using TRTExec. We manually inspect the DeepStream output video, the TRTEXEC results seem to align with the DeepStream inference overlay output video.
  • It seems like the accuracy drop only affected Class_1, but not Class_2.

TAO ClassificationTF2 Configuration

dataset:
  train_dataset_path: "/workspace/tao-experiments/data/train"
  val_dataset_path: "/workspace/tao-experiments/data/val"
  preprocess_mode: 'torch'
  num_classes: 2
  augmentation:
    enable_color_augmentation: True
train:
  checkpoint: '/workspace/tao-experiments/pretrained_classification_tf2_vefficientnet_b0'
  batch_size_per_gpu: 32
  num_epochs: 100
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.0005
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
  results_dir: '/workspace/tao-experiments/results/train'
model:
  backbone: 'efficientnet-b0'
  input_width: 128
  input_height: 128
  input_channels: 3
  dropout: 0.12
evaluate:
  dataset_path: "/workspace/tao-experiments/data/test"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  top_k: 1
  batch_size: 16
  n_workers: 8
  results_dir: '/workspace/tao-experiments/results/val'
inference:
  image_dir: "/workspace/tao-experiments/data/test_images"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  results_dir: '/workspace/tao-experiments/results/inference'
  classmap: "/workspace/tao-experiments/results/train/classmap.json"
export:
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  onnx_file: '/workspace/tao-experiments/results/export/efficientnet-b0.onnx'
  results_dir: '/workspace/tao-experiments/results/export'

How I converted the model from TLT to TensorRT engine:

  1. Convert from TLT to ONNX using tao model classification_tf2 export. This step was not performed on the Jetson device. We were using NVIDIA GeForce RTX 4090 GPU for model training and exporting.
  2. Convert from ONNX to TensorRT. This was conducted on the Jetson Orin Nano device. We tried two methods: (i) deploy the TLT model to Deepstream directly and let DeepStream handle the TRT conversion implicitly; and (ii) use TRTEXEC to compile TensorRT engine. However, both methods give the same (bad) inference results.

DeepStream App Configuration Files:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#kitti-track-output-dir=tracker_output_folder

[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0

[source0]
enable=1
type=2
num-sources=1
uri=file:///path/to/test/video/file.mp4
gpu-id=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=33333
width=1920
height=1080

[sink0]
enable=1
type=3
container=1
codec=1
enc-type=1
sync=0
bitrate=3000000
profile=0
output-file=/path/to/inference/overlay/video.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial

[primary-gie]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=/opt/nvidia/deepstream/deepstream-6.3/samples/configs/tao_pretrained_models/nvinfer/config_infer_primary_peoplenet.txt

[secondary-gie0]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
gie-unique-id=3
operate-on-gie-id=1
operate-on-class-ids=0
config-file=/path/to/config_infer_secondary_classificationtf2.txt

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
gpu-id=0
display-tracking-id=1

[tests]
file-loop=0

config_infer_secondary_classificationtf2.txt
(we followed this guide)

[property]
gpu-id=0
# preprocessing_mode == 'torch'
net-scale-factor=0.017507
offsets=123.675;116.280;103.53
model-color-format=0

# model config
onnx-file=/path/to/efficientnet-b0.onnx
model-engine-file=/path/to/efficientnet-b0.onnx_b1_gpu0_fp32.engine
labelfile-path=/path/to/labels.txt
classifier-threshold=0.5
operate-on-class-ids=0
batch-size=1

network-mode=0
network-type=1
process-mode=2

secondary-reinfer-interval=0
gie-unique-id=3

We would like to know: (i) what causes the degradation in model accuracy?; and (ii) how could we minimize this performance gap between the TLT model and the TensorRT engine.

4 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles