MaskRCNN TAO UFF model to .engine for Jetson AGX Orin

Please provide the following information when requesting support.

• Hardware Jetson Nano/Jetson AGX Orin/dGPU
• Network Type Mask_rcnn
• TAO 5.3
• Training spec file: default

I want to learn how to provide maskrcnn uff model to target like Jetson Nano and AGX Orin. I got it working on the Jetson Nano: I specified config_infer_primary.txt etc. Works because I used TensorRT OSS. But I don’t know why tao-convert dosn’t work on Nano and AGX. For example I got this:

./tao-converter -d 3,256,256 -k key -o generate_detections,mask_fcn_logits/BiasAdd model.epoch-20.tlt
[INFO] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 285, GPU 3889 (MiB)
[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
zsh: segmentation fault (core dumped)  ./tao-converter -d 3,256,256 -k key -o  model.epoch-20.tlt

I got the key value from TAO 5.3 source code. Again this dosn’t work on both Jetsons.

Then I see that the newest version of the AGX Orin has version of TensorRT 8.6.1 and TensorRT 10. Because of this trtexec dons’t support UFF models (I also read this on the documentation and forums that this format is deprecated). Because of this I used this on the AGX Orin:

docker run --runtime=nvidia --gpus all -it --rm -v $(pwd):/workspace/deep nvcr.io/nvidia/tensorrt:24.01-py3

and inside container:

trtexec --uff=model.epoch-20. --maxBatch=1 --uffInput=Input,3,256,256 --output=generate_detections,mask_fcn_logits/BiasAdd --fp16 --best --saveEngine=model.engine

And process stuck with one CPU usage at 100%:

[11/15/2024-12:41:09] [I] === Model Options ===
[11/15/2024-12:41:09] [I] Format: UFF
[11/15/2024-12:41:09] [I] Model: model.epoch-20.uff
[11/15/2024-12:41:09] [I] Uff Inputs Layout: NCHW
[11/15/2024-12:41:09] [I] Input: Input,3,256,256
[11/15/2024-12:41:09] [I] Output: generate_detections mask_fcn_logits/BiasAdd
[11/15/2024-12:41:09] [I] === Build Options ===
[11/15/2024-12:41:09] [I] Max batch: 1
[11/15/2024-12:41:09] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/15/2024-12:41:09] [I] minTiming: 1
[11/15/2024-12:41:09] [I] avgTiming: 8
[11/15/2024-12:41:09] [I] Precision: FP32+FP16+INT8
[11/15/2024-12:41:09] [I] LayerPrecisions:
[11/15/2024-12:41:09] [I] Layer Device Types:
[11/15/2024-12:41:09] [I] Calibration: Dynamic
[11/15/2024-12:41:09] [I] Refit: Disabled
[11/15/2024-12:41:09] [I] Version Compatible: Disabled
[11/15/2024-12:41:09] [I] TensorRT runtime: full
[11/15/2024-12:41:09] [I] Lean DLL Path:
[11/15/2024-12:41:09] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/15/2024-12:41:09] [I] Exclude Lean Runtime: Disabled
[11/15/2024-12:41:09] [I] Sparsity: Disabled
[11/15/2024-12:41:09] [I] Safe mode: Disabled
[11/15/2024-12:41:09] [I] Build DLA standalone loadable: Disabled
[11/15/2024-12:41:09] [I] Allow GPU fallback for DLA: Disabled
[11/15/2024-12:41:09] [I] DirectIO mode: Disabled
[11/15/2024-12:41:09] [I] Restricted mode: Disabled
[11/15/2024-12:41:09] [I] Skip inference: Disabled
[11/15/2024-12:41:09] [I] Save engine: model.engine
[11/15/2024-12:41:09] [I] Load engine:
[11/15/2024-12:41:09] [I] Profiling verbosity: 0
[11/15/2024-12:41:09] [I] Tactic sources: Using default tactic sources
[11/15/2024-12:41:09] [I] timingCacheMode: local
[11/15/2024-12:41:09] [I] timingCacheFile:
[11/15/2024-12:41:09] [I] Heuristic: Disabled
[11/15/2024-12:41:09] [I] Preview Features: Use default preview flags.
[11/15/2024-12:41:09] [I] MaxAuxStreams: -1
[11/15/2024-12:41:09] [I] BuilderOptimizationLevel: -1
[11/15/2024-12:41:09] [I] Input(s)s format: fp32:CHW
[11/15/2024-12:41:09] [I] Output(s)s format: fp32:CHW
[11/15/2024-12:41:09] [I] Input build shapes: model
[11/15/2024-12:41:09] [I] Input calibration shapes: model
[11/15/2024-12:41:09] [I] === System Options ===
[11/15/2024-12:41:09] [I] Device: 0
[11/15/2024-12:41:09] [I] DLACore:
[11/15/2024-12:41:09] [I] Plugins:
[11/15/2024-12:41:09] [I] setPluginsToSerialize:
[11/15/2024-12:41:09] [I] dynamicPlugins:
[11/15/2024-12:41:09] [I] ignoreParsedPluginLibs: 0
[11/15/2024-12:41:09] [I]
[11/15/2024-12:41:09] [I] === Inference Options ===
[11/15/2024-12:41:09] [I] Batch: 1
[11/15/2024-12:41:09] [I] Input inference shapes: model
[11/15/2024-12:41:09] [I] Iterations: 10
[11/15/2024-12:41:09] [I] Duration: 3s (+ 200ms warm up)
[11/15/2024-12:41:09] [I] Sleep time: 0ms
[11/15/2024-12:41:09] [I] Idle time: 0ms
[11/15/2024-12:41:09] [I] Inference Streams: 1
[11/15/2024-12:41:09] [I] ExposeDMA: Disabled
[11/15/2024-12:41:09] [I] Data transfers: Enabled
[11/15/2024-12:41:09] [I] Spin-wait: Disabled
[11/15/2024-12:41:09] [I] Multithreading: Disabled
[11/15/2024-12:41:09] [I] CUDA Graph: Disabled
[11/15/2024-12:41:09] [I] Separate profiling: Disabled
[11/15/2024-12:41:09] [I] Time Deserialize: Disabled
[11/15/2024-12:41:09] [I] Time Refit: Disabled
[11/15/2024-12:41:09] [I] NVTX verbosity: 0
[11/15/2024-12:41:09] [I] Persistent Cache Ratio: 0
[11/15/2024-12:41:09] [I] Inputs:
[11/15/2024-12:41:09] [I] === Reporting Options ===
[11/15/2024-12:41:09] [I] Verbose: Disabled
[11/15/2024-12:41:09] [I] Averages: 10 inferences
[11/15/2024-12:41:09] [I] Percentiles: 90,95,99
[11/15/2024-12:41:09] [I] Dump refittable layers:Disabled
[11/15/2024-12:41:09] [I] Dump output: Disabled
[11/15/2024-12:41:09] [I] Profile: Disabled
[11/15/2024-12:41:09] [I] Export timing to JSON file:
[11/15/2024-12:41:09] [I] Export output to JSON file:
[11/15/2024-12:41:09] [I] Export profile to JSON file:
[11/15/2024-12:41:09] [I]
[11/15/2024-12:41:09] [I] === Device Information ===
[11/15/2024-12:41:09] [I] Selected Device: Orin
[11/15/2024-12:41:09] [I] Compute Capability: 8.7
[11/15/2024-12:41:09] [I] SMs: 16
[11/15/2024-12:41:09] [I] Device Global Memory: 30696 MiB
[11/15/2024-12:41:09] [I] Shared Memory per SM: 164 KiB
[11/15/2024-12:41:09] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/15/2024-12:41:09] [I] Application Compute Clock Rate: 1.3 GHz
[11/15/2024-12:41:09] [I] Application Memory Clock Rate: 1.3 GHz
[11/15/2024-12:41:09] [I]
[11/15/2024-12:41:09] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/15/2024-12:41:09] [I]
[11/15/2024-12:41:09] [I] TensorRT version: 8.6.1
[11/15/2024-12:41:09] [I] Loading standard plugins
[11/15/2024-12:41:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 18, GPU 4105 (MiB)
^C

I don’t know how should spec for DeepStream 7.1 should look because also I tried conversion through first running but also dosn’t work and I think because of deprivation of UFF models.

On the PC and RTX4090 Trtexec works.

What should I do to run this model from TAO mask_rcnn on the Jetson AGX Orin, because mask_rcnn officially export mask_rcnn model to UFF format?

Best regards,
Darek

2 posts - 2 participants

Read full topic

MaskRCNN TAO UFF model to .engine for Jetson AGX Orin

Trending Articles

Halestorm – Everest – Pre-Single [iTunes Plus M4A]

Lady Gaga – MAYHEM (Bonus Tracks Version) [iTunes Rip M4A]

The Ultimate Doors Discography - 90 Albums - All MP3's

Griffith faces three more offences

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

Mp3 Download: Mdu - Mazola

WALLACE; JACQUELINE

Trial of East Grinstead man accused of rape to begin next week

Schools benefit from American donation

MS-CHAPV2 NAP Policy failing - Reason Code 65

Black Angus Grilled Artichokes

Aaron Haywood – Hyde

Practice Sheet of Right form of verbs for HSC Students

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Motrex mtxm100ja

Telangana TS New Food Security Card/ Telangana Ration card Application Form...

ROBERT F TOSTA Arrested by Miami-Dade County Corrections on Nov 12, 2016

Read GOS (Generic Object Service) Picture Attachments and Display it into...

Theja Surapaneni The ‘Most Attractive' Man on Australian TV Of All Time

TBT: Samini “Tempo” Feat Mugeez (R2Bees) Prod by Kaywa