Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

TAO 5.3 Segformer results poor

$
0
0

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX A6000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 5.3
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hi

I have successfully ran segformer models with custom datasets using TAO 5.2.01. I have exported these models and used in production using Triton.

I have created a new python environment and installed the TAO 5.3.0 launcher. I have tried multiple TAO 5.3 segformer configurations with the same dataset as used for 5.2 and received poor or unstable results. Specifically:

  1. I noticed I had to run the 5.3 segformer with the container running under root privileges.
  2. The pytorch implementation with 5.3 requires all images to be the same size during the validation hook runs. This was not the case with 5.2 and introduces data prep chores.
  3. I used the exact same data set for 5.2 and 5.3, however the 5.3 runs gave very poor (I would say random or numerically unstable) results where as 5.2 gave excellent results.
  4. I could not get any results (i.e. other than NaN) using 5.3 for fan models. I was able to get “results” as per my para 3 by using the mit_b5 backbone (but they were poor/meaningless)

As indicated above, my datasets are custom but I get great results (and I’ve exported to TRT and using in triton successfully in production) from 5.2. The pytorch implementation in 5.3 is definitely different.

Hope this provides some further insights.

cheers

6 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles