Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

Setup information
• Hardware Platform (Jetson / GPU) : Jetson Orin Nano
• DeepStream Version : DeepStream 6.3
• JetPack Version (valid for Jetson only) : Jetpack 5.1.3
• TensorRT Version : TensorRT 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) : N/A
• Issue Type( questions, new requirements, bugs) : Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) : See below for configurations.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) : N/A

Issue:
Fine-tuned TAO ClassificationTF2 TLT model (EfficientNet-B0 backbone) gives high inference accuracy, but accuracy dropped after converting the model to TensorRT engine and running inference in DeepStream as SGIE.

Evaluation Method:
Results were compared on the same video.
This is how we compare the TLT and TensorRT models:

We used the same PGIE (PeopleNet) and tracker to perform detection and tracking.
We cropped the objects from the video frames based on the bounding boxes in the KITTI tracker output file.
We run tao model classification_tf2 inference and evaluate the results of the TLT model.
We run inference on the same video in DeepStream and TRTEXEC, and manually compare the results.

Accuracy Comparison:

Object	Ground Truth Class	TLT Accuracy	TensorRT Accuracy
Object_1	Class_1	96.01%	41.46%
Object_2	Class_1	97.60%	9.36%
Object_3	Class_1	100%	18.00%
Object_4	Class_2	100%	100%
Object_5	Class_2	100%	100%

Notes:

The TensorRT accuracy was obtained by running TRT inference using TRTExec. We manually inspect the DeepStream output video, the TRTEXEC results seem to align with the DeepStream inference overlay output video.
It seems like the accuracy drop only affected Class_1, but not Class_2.

TAO ClassificationTF2 Configuration

dataset:
  train_dataset_path: "/workspace/tao-experiments/data/train"
  val_dataset_path: "/workspace/tao-experiments/data/val"
  preprocess_mode: 'torch'
  num_classes: 2
  augmentation:
    enable_color_augmentation: True
train:
  checkpoint: '/workspace/tao-experiments/pretrained_classification_tf2_vefficientnet_b0'
  batch_size_per_gpu: 32
  num_epochs: 100
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.0005
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
  results_dir: '/workspace/tao-experiments/results/train'
model:
  backbone: 'efficientnet-b0'
  input_width: 128
  input_height: 128
  input_channels: 3
  dropout: 0.12
evaluate:
  dataset_path: "/workspace/tao-experiments/data/test"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  top_k: 1
  batch_size: 16
  n_workers: 8
  results_dir: '/workspace/tao-experiments/results/val'
inference:
  image_dir: "/workspace/tao-experiments/data/test_images"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  results_dir: '/workspace/tao-experiments/results/inference'
  classmap: "/workspace/tao-experiments/results/train/classmap.json"
export:
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  onnx_file: '/workspace/tao-experiments/results/export/efficientnet-b0.onnx'
  results_dir: '/workspace/tao-experiments/results/export'

How I converted the model from TLT to TensorRT engine:

Convert from TLT to ONNX using tao model classification_tf2 export. This step was not performed on the Jetson device. We were using NVIDIA GeForce RTX 4090 GPU for model training and exporting.
Convert from ONNX to TensorRT. This was conducted on the Jetson Orin Nano device. We tried two methods: (i) deploy the TLT model to Deepstream directly and let DeepStream handle the TRT conversion implicitly; and (ii) use TRTEXEC to compile TensorRT engine. However, both methods give the same (bad) inference results.

DeepStream App Configuration Files:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#kitti-track-output-dir=tracker_output_folder

[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0

[source0]
enable=1
type=2
num-sources=1
uri=file:///path/to/test/video/file.mp4
gpu-id=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=33333
width=1920
height=1080

[sink0]
enable=1
type=3
container=1
codec=1
enc-type=1
sync=0
bitrate=3000000
profile=0
output-file=/path/to/inference/overlay/video.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial

[primary-gie]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=/opt/nvidia/deepstream/deepstream-6.3/samples/configs/tao_pretrained_models/nvinfer/config_infer_primary_peoplenet.txt

[secondary-gie0]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
gie-unique-id=3
operate-on-gie-id=1
operate-on-class-ids=0
config-file=/path/to/config_infer_secondary_classificationtf2.txt

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
gpu-id=0
display-tracking-id=1

[tests]
file-loop=0

config_infer_secondary_classificationtf2.txt
(we followed this guide)

[property]
gpu-id=0
# preprocessing_mode == 'torch'
net-scale-factor=0.017507
offsets=123.675;116.280;103.53
model-color-format=0

# model config
onnx-file=/path/to/efficientnet-b0.onnx
model-engine-file=/path/to/efficientnet-b0.onnx_b1_gpu0_fp32.engine
labelfile-path=/path/to/labels.txt
classifier-threshold=0.5
operate-on-class-ids=0
batch-size=1

network-mode=0
network-type=1
process-mode=2

secondary-reinfer-interval=0
gie-unique-id=3

We would like to know: (i) what causes the degradation in model accuracy?; and (ii) how could we minimize this performance gap between the TLT model and the TensorRT engine.

4 posts - 2 participants

Read full topic

Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List