Please provide the following information when requesting support.
• Hardware RTX4090
• Network Type Segformer FAN
• TLT Version
tao info --verbose
> Configuration of the TAO Toolkit Instance
task_group: model: dockers: nvidia/tao/tao-toolkit: 5.0.0-tf2.11.0: docker_registry: tasks: 1. classification_tf2 2. efficientdet_tf2 5.0.0-tf1.15.5: docker_registry: tasks: 1. bpnet 2. classification_tf1 3. converter 4. detectnet_v2 5. dssd 6. efficientdet_tf1 7. faster_rcnn 8. fpenet 9. lprnet 10. mask_rcnn 11. multitask_classification 12. retinanet 13. ssd 14. unet 15. yolo_v3 16. yolo_v4 17. yolo_v4_tiny 5.2.0-pyt2.1.0: docker_registry: tasks: 1. action_recognition 2. centerpose 3. deformable_detr 4. dino 5. mal 6. ml_recog 7. ocdnet 8. ocrnet 9. optical_inspection 10. pointpillars 11. pose_classification 12. re_identification 13. visual_changenet docker_registry: tasks: 1. classification_pyt 2. segformer dataset: dockers: nvidia/tao/tao-toolkit: 5.2.0-data-services: docker_registry: tasks: 1. augmentation 2. auto_label 3. annotations 4. analytics deploy: dockers: nvidia/tao/tao-toolkit: 5.2.0-deploy: docker_registry: tasks: 1. visual_changenet 2. centerpose 3. classification_pyt 4. classification_tf1 5. classification_tf2 6. deformable_detr 7. detectnet_v2 8. dino 9. dssd 10. efficientdet_tf1 11. efficientdet_tf2 12. faster_rcnn 13. lprnet 14. mask_rcnn 15. ml_recog 16. multitask_classification 17. ocdnet 18. ocrnet 19. optical_inspection 20. retinanet 21. segformer 22. ssd 23. trtexec 24. unet 25. yolo_v3 26. yolo_v4 27. yolo_v4_tiny format_version: 3.0 toolkit_version: published_date: 01/16/2024
With 512X512 color images My training set is around 4K images.
If I use a batch size of 8, it uses all the memory, and the ETA after 1000 cycles is 2 days.
If I use batch size of 1, it uses about 3GB and the ETA is 5 hours… ???
This is counterintuitive. Any ideas why?
Many thanks!!
4 posts - 2 participants