Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

Trining TAO Toolkit results in 0.0000% accuracy

$
0
0

Please provide the following information when requesting support.

• Hardware (NVIDIA GeForce GTX 1650)
• Network Type (Detectnet_v2)
• Training spec file (lpd_train_resnet18_kitti.txt (3.4 KB)
Dataset used: License Plate Recognition Object Detection Dataset and Pre-Trained Model by Roboflow Universe Projects

• How to reproduce the issue ?

I wanted to train my own Licence Plate Detection System in NVIDIA TAO.

  1. I downloaded the dataset from the link above and followed the steps from the sample notebook with detecnet_v2

  2. I managed to create the training data sample, needed to clean up as some images had no labels and were able to create the tf records

  3. I installed ngc cli and could download the pretrained model

  4. I created my own training specification file (and already modified quite a few values)

→ However, I always get 0.0000% accuracy after training…

Validation cost: 0.000010
Mean average_precision (in %): 0.0000

+------------+--------------------------+
| class name | average precision (in %) |
+------------+--------------------------+
|    lpd     |           0.0            |
+------------+--------------------------+

Median Inference Time: 0.025054
2024-01-29 18:32:20,893 [TAO Toolkit] [INFO] root 2102: Evaluation metrics generated.
2024-01-29 18:32:20,893 [TAO Toolkit] [INFO] root 2102: Training loop completed.
2024-01-29 18:32:20,894 [TAO Toolkit] [INFO] root 2102: Saving trained model.
2024-01-29 18:32:21,056 [TAO Toolkit] [INFO] root 2102: Model saved.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

But the training loss goes down form epoch to epoch


INFO:tensorflow:epoch = 0.00043122035360068997, learning_rate = 5.1002854e-07, loss = 0.08813875, step = 2 (329.480 sec)
2024-01-29 18:08:51,000 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.00043122035360068997, learning_rate = 5.1002854e-07, loss = 0.08813875, step = 2 (329.480 sec)

...

INFO:tensorflow:epoch = 0.9911599827511859, learning_rate = 5.7266834e-07, loss = 0.00061230396, step = 4597 (5.294 sec)
2024-01-29 18:30:12,241 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9911599827511859, learning_rate = 5.7266834e-07, loss = 0.00061230396, step = 4597 (5.294 sec)


There is probably an issue with configuration file but I cannot really spot it…

There were already a similar post (Mean average precision of 0.00 for detectnet_v2 using Tao Toolkit) and I tried to follow the hints but it did not help me.

Best regards

3 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles