Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

Low accuracy for MS COCO dataset in tao maskrcnn model training

$
0
0

• Hardware A5000
• Network Type Mask_rcnn
• TLT Version nvidia/tao/tao-toolkit-tf: v3.22.05-tf1.15.5-py3
• Training spec file
tao_maskrcnn_02_09_24_train_v6.txt (2.4 KB)
tao_maskrcnn_02_09_24_train_v7.txt (2.4 KB)

This is a trend I have observed while training with the MS COCO dataset. The dataset was filtered to include only “truck”, “bus” and “person” classes. TFRecords were then generated for training and validation using the tao command line command. Since I was trying out training for the first time for this model, I initially planned to run training with the default config file in the documentation MaskRCNN - NVIDIA Docs , but this failed as I was getting an error telling me the training loss has gone to NaN in the very first iteration. The training configs I have attached have much lower loss values compared to the values mentioned in the above documentation. In the case of the training config ending with v6, the training loss was jumping around a lot so I reduced the values again by a factor of 10 and you will find this update in the config ending with v7 and the loss values stopped jumping everywhere. In both the above trainings I have noticed that the loss value does’nt come down and stays around 3, while running inference on the validation dataset with the final model file there are’nt any detections or segmentations in the output for any of the validation images and the AP values computed are near 0. Since there is no loss propagation I am also unable to select the right model to test and I get to know that the training is not happening properly. What could be the reason for this behaviour ?
Thanks

6 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles