Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet | NVIDIA NGC)

$
0
0

Hello everyone, I’m working with NVIDIA’s PointPillarnet deployable model (available on NGC) and encountered an issue when converting the provided ONNX model to a TensorRT engine.

Issue Description:

  • I used the provided ONNX model and converted it to a TensorRT engine using the following command:
/usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1
  • The conversion process completed successfully (see attached logs). However, when I try to load the resulting pointpillars.trt model in DriveWorks, I get the error:
    DW_DNN_INVALID_MODEL.
  • Additionally, attempting to inspect or view the model layers for the generated .trt file results in the error:
    “Invalid file content. File contains undocumented TensorRT engine data.”

Logs:

Below is an excerpt from the conversion log showing that the engine was built without errors:

&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1
[02/07/2025-16:47:12] [I] === Model Options ===
[02/07/2025-16:47:12] [I] Format: ONNX
[02/07/2025-16:47:12] [I] Model: pointpillars.onnx
[02/07/2025-16:47:12] [I] Output:
[02/07/2025-16:47:12] [I] === Build Options ===
[02/07/2025-16:47:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[02/07/2025-16:47:12] [I] avgTiming: 8
[02/07/2025-16:47:12] [I] Precision: FP32+FP16
[02/07/2025-16:47:12] [I] LayerPrecisions: 
[02/07/2025-16:47:12] [I] Layer Device Types: 
[02/07/2025-16:47:12] [I] Calibration: 
[02/07/2025-16:47:12] [I] Refit: Disabled
[02/07/2025-16:47:12] [I] Strip weights: Disabled
[02/07/2025-16:47:12] [I] Version Compatible: Disabled
[02/07/2025-16:47:12] [I] ONNX Plugin InstanceNorm: Disabled
[02/07/2025-16:47:12] [I] TensorRT runtime: full
[02/07/2025-16:47:12] [I] Lean DLL Path: 
[02/07/2025-16:47:12] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/07/2025-16:47:12] [I] Exclude Lean Runtime: Disabled
[02/07/2025-16:47:12] [I] Sparsity: Disabled
[02/07/2025-16:47:12] [I] Safe mode: Disabled
[02/07/2025-16:47:12] [I] Build DLA standalone loadable: Disabled
[02/07/2025-16:47:12] [I] Allow GPU fallback for DLA: Disabled
[02/07/2025-16:47:12] [I] DirectIO mode: Disabled
[02/07/2025-16:47:12] [I] Restricted mode: Disabled
[02/07/2025-16:47:12] [I] Skip inference: Disabled
[02/07/2025-16:47:12] [I] Save engine: pointpillars.trt
[02/07/2025-16:47:12] [I] Load engine: 
[02/07/2025-16:47:12] [I] Profiling verbosity: 0
[02/07/2025-16:47:12] [I] Tactic sources: Using default tactic sources
[02/07/2025-16:47:12] [I] timingCacheMode: local
[02/07/2025-16:47:12] [I] timingCacheFile: 
[02/07/2025-16:47:12] [I] Enable Compilation Cache: Enabled
[02/07/2025-16:47:12] [I] errorOnTimingCacheMiss: Disabled
[02/07/2025-16:47:12] [I] Preview Features: Use default preview flags.
[02/07/2025-16:47:12] [I] MaxAuxStreams: -1
[02/07/2025-16:47:12] [I] BuilderOptimizationLevel: -1
[02/07/2025-16:47:12] [I] Calibration Profile Index: 0
[02/07/2025-16:47:12] [I] Weight Streaming: Disabled
[02/07/2025-16:47:12] [I] Runtime Platform: Same As Build
[02/07/2025-16:47:12] [I] Debug Tensors: 
[02/07/2025-16:47:12] [I] Input(s)s format: fp32:CHW
[02/07/2025-16:47:12] [I] Output(s)s format: fp32:CHW
[02/07/2025-16:47:12] [I] Input build shape (profile 0): points=1x204800x4+1x204800x4+1x204800x4
[02/07/2025-16:47:12] [I] Input build shape (profile 0): num_points=1+1+1
[02/07/2025-16:47:12] [I] Input calibration shapes: model
[02/07/2025-16:47:12] [I] === System Options ===
[02/07/2025-16:47:12] [I] Device: 0
[02/07/2025-16:47:12] [I] DLACore: 
[02/07/2025-16:47:12] [I] Plugins:
[02/07/2025-16:47:12] [I] setPluginsToSerialize:
[02/07/2025-16:47:12] [I] dynamicPlugins:
[02/07/2025-16:47:12] [I] ignoreParsedPluginLibs: 0
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] === Inference Options ===
[02/07/2025-16:47:12] [I] Batch: Explicit
[02/07/2025-16:47:12] [I] Input inference shape : num_points=1
[02/07/2025-16:47:12] [I] Input inference shape : points=1x204800x4
[02/07/2025-16:47:12] [I] Iterations: 10
[02/07/2025-16:47:12] [I] Duration: 3s (+ 200ms warm up)
[02/07/2025-16:47:12] [I] Sleep time: 0ms
[02/07/2025-16:47:12] [I] Idle time: 0ms
[02/07/2025-16:47:12] [I] Inference Streams: 1
[02/07/2025-16:47:12] [I] ExposeDMA: Disabled
[02/07/2025-16:47:12] [I] Data transfers: Enabled
[02/07/2025-16:47:12] [I] Spin-wait: Disabled
[02/07/2025-16:47:12] [I] Multithreading: Disabled
[02/07/2025-16:47:12] [I] CUDA Graph: Disabled
[02/07/2025-16:47:12] [I] Separate profiling: Disabled
[02/07/2025-16:47:12] [I] Time Deserialize: Disabled
[02/07/2025-16:47:12] [I] Time Refit: Disabled
[02/07/2025-16:47:12] [I] NVTX verbosity: 0
[02/07/2025-16:47:12] [I] Persistent Cache Ratio: 0
[02/07/2025-16:47:12] [I] Optimization Profile Index: 0
[02/07/2025-16:47:12] [I] Weight Streaming Budget: 100.000000%
[02/07/2025-16:47:12] [I] Inputs:
[02/07/2025-16:47:12] [I] Debug Tensor Save Destinations:
[02/07/2025-16:47:12] [I] === Reporting Options ===
[02/07/2025-16:47:12] [I] Verbose: Disabled
[02/07/2025-16:47:12] [I] Averages: 10 inferences
[02/07/2025-16:47:12] [I] Percentiles: 90,95,99
[02/07/2025-16:47:12] [I] Dump refittable layers:Disabled
[02/07/2025-16:47:12] [I] Dump output: Disabled
[02/07/2025-16:47:12] [I] Profile: Disabled
[02/07/2025-16:47:12] [I] Export timing to JSON file: 
[02/07/2025-16:47:12] [I] Export output to JSON file: 
[02/07/2025-16:47:12] [I] Export profile to JSON file: 
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] === Device Information ===
[02/07/2025-16:47:12] [I] Available Devices: 
[02/07/2025-16:47:12] [I]   Device 0: "NVIDIA GeForce RTX 3050" UUID: GPU-56d342c8-117a-faea-1189-84c071cfdf62
[02/07/2025-16:47:12] [I] Selected Device: NVIDIA GeForce RTX 3050
[02/07/2025-16:47:12] [I] Selected Device ID: 0
[02/07/2025-16:47:12] [I] Selected Device UUID: GPU-56d342c8-117a-faea-1189-84c071cfdf62
[02/07/2025-16:47:12] [I] Compute Capability: 8.6
[02/07/2025-16:47:12] [I] SMs: 20
[02/07/2025-16:47:12] [I] Device Global Memory: 7958 MiB
[02/07/2025-16:47:12] [I] Shared Memory per SM: 100 KiB
[02/07/2025-16:47:12] [I] Memory Bus Width: 128 bits (ECC disabled)
[02/07/2025-16:47:12] [I] Application Compute Clock Rate: 1.807 GHz
[02/07/2025-16:47:12] [I] Application Memory Clock Rate: 7.001 GHz
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] TensorRT version: 10.3.0
[02/07/2025-16:47:12] [I] Loading standard plugins
[02/07/2025-16:47:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 20, GPU 615 (MiB)
[02/07/2025-16:47:16] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2087, GPU +386, now: CPU 2262, GPU 1001 (MiB)
[02/07/2025-16:47:16] [I] Start parsing network model.
[02/07/2025-16:47:16] [I] [TRT] ----------------------------------------------------------------
[02/07/2025-16:47:16] [I] [TRT] Input filename:   pointpillars.onnx
[02/07/2025-16:47:16] [I] [TRT] ONNX IR version:  0.0.8
[02/07/2025-16:47:16] [I] [TRT] Opset version:    11
[02/07/2025-16:47:16] [I] [TRT] Producer name:    
[02/07/2025-16:47:16] [I] [TRT] Producer version: 
[02/07/2025-16:47:16] [I] [TRT] Domain:           
[02/07/2025-16:47:16] [I] [TRT] Model version:    0
[02/07/2025-16:47:16] [I] [TRT] Doc string:       
[02/07/2025-16:47:16] [I] [TRT] ----------------------------------------------------------------
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: VoxelGeneratorPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: VoxelGeneratorPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: VoxelGeneratorPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: VoxelGeneratorPlugin
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: PillarScatterPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: PillarScatterPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: PillarScatterPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: PillarScatterPlugin
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: DecodeBbox3DPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: DecodeBbox3DPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: DecodeBbox3DPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: DecodeBbox3DPlugin
[02/07/2025-16:47:16] [I] Finished parsing network model. Parse time: 0.0232674
[02/07/2025-16:47:16] [I] Set shape of input tensor points for optimization profile 0 to: MIN=1x204800x4 OPT=1x204800x4 MAX=1x204800x4
[02/07/2025-16:47:16] [I] Set shape of input tensor num_points for optimization profile 0 to: MIN=1 OPT=1 MAX=1
[02/07/2025-16:47:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[02/07/2025-16:47:44] [I] [TRT] Detected 2 inputs and 2 output network tensors.
[02/07/2025-16:47:45] [I] [TRT] Total Host Persistent Memory: 115984
[02/07/2025-16:47:45] [I] [TRT] Total Device Persistent Memory: 529408
[02/07/2025-16:47:45] [I] [TRT] Total Scratch Memory: 270532608
[02/07/2025-16:47:45] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 39 steps to complete.
[02/07/2025-16:47:45] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.472854ms to assign 6 blocks to 39 nodes requiring 319466496 bytes.
[02/07/2025-16:47:45] [I] [TRT] Total Activation Memory: 319465984
[02/07/2025-16:47:45] [I] [TRT] Total Weights Memory: 6627616
[02/07/2025-16:47:45] [I] [TRT] Engine generation completed in 28.9121 seconds.
[02/07/2025-16:47:45] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 247 MiB
[02/07/2025-16:47:45] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3506 MiB
[02/07/2025-16:47:45] [I] Engine built in 28.9264 sec.
[02/07/2025-16:47:45] [I] Created engine with size: 7.52948 MiB
[02/07/2025-16:47:45] [I] [TRT] Loaded engine size: 7 MiB
[02/07/2025-16:47:45] [I] Engine deserialized in 0.00884943 sec.
[02/07/2025-16:47:45] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +305, now: CPU 0, GPU 311 (MiB)
[02/07/2025-16:47:45] [I] Setting persistentCacheLimit to 0 bytes.
[02/07/2025-16:47:45] [I] Created execution context with device memory size: 304.667 MiB
[02/07/2025-16:47:45] [I] Using random values for input points
[02/07/2025-16:47:45] [I] Input binding for points with dimensions 1x204800x4 is created.
[02/07/2025-16:47:45] [I] Using random values for input num_points
[02/07/2025-16:47:45] [I] Input binding for num_points with dimensions 1 is created.
[02/07/2025-16:47:45] [I] Output binding for output_boxes with dimensions 1x393216x9 is created.
[02/07/2025-16:47:45] [I] Output binding for num_boxes with dimensions 1 is created.
[02/07/2025-16:47:45] [I] Starting inference
[02/07/2025-16:47:48] [I] Warmup completed 38 queries over 200 ms
[02/07/2025-16:47:48] [I] Timing trace has 562 queries over 3.01639 s
[02/07/2025-16:47:48] [I] 
[02/07/2025-16:47:48] [I] === Trace details ===
[02/07/2025-16:47:48] [I] Trace averages of 10 runs:
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32603 ms - Host latency: 6.67 ms (enqueue 0.121249 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32287 ms - Host latency: 6.66584 ms (enqueue 0.123315 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33813 ms - Host latency: 6.6947 ms (enqueue 0.37449 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34866 ms - Host latency: 6.71088 ms (enqueue 0.503482 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32604 ms - Host latency: 6.69517 ms (enqueue 0.603879 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34661 ms - Host latency: 6.71105 ms (enqueue 0.550467 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33544 ms - Host latency: 6.70376 ms (enqueue 0.604047 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33165 ms - Host latency: 6.70045 ms (enqueue 0.603473 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34303 ms - Host latency: 6.71248 ms (enqueue 0.603162 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33187 ms - Host latency: 6.70071 ms (enqueue 0.60415 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33473 ms - Host latency: 6.70226 ms (enqueue 0.563995 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34416 ms - Host latency: 6.70048 ms (enqueue 0.466834 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.60559 ms - Host latency: 6.97324 ms (enqueue 0.521735 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.51282 ms - Host latency: 6.96803 ms (enqueue 0.500397 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33761 ms - Host latency: 6.70518 ms (enqueue 0.61106 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34877 ms - Host latency: 6.71788 ms (enqueue 0.605212 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32324 ms - Host latency: 6.69222 ms (enqueue 0.624792 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33719 ms - Host latency: 6.70643 ms (enqueue 0.603772 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34885 ms - Host latency: 6.71973 ms (enqueue 0.619775 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33636 ms - Host latency: 6.70391 ms (enqueue 0.605933 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33063 ms - Host latency: 6.69882 ms (enqueue 0.605725 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.50693 ms - Host latency: 6.87932 ms (enqueue 0.619861 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.48197 ms - Host latency: 6.85001 ms (enqueue 0.558411 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3382 ms - Host latency: 6.69918 ms (enqueue 0.452905 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35081 ms - Host latency: 6.71865 ms (enqueue 0.612085 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32389 ms - Host latency: 6.68502 ms (enqueue 0.575732 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34191 ms - Host latency: 6.71628 ms (enqueue 0.639648 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35502 ms - Host latency: 6.72699 ms (enqueue 0.560095 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34086 ms - Host latency: 6.71104 ms (enqueue 0.636792 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.51641 ms - Host latency: 6.87135 ms (enqueue 0.531763 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3255 ms - Host latency: 6.67443 ms (enqueue 0.48988 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34557 ms - Host latency: 6.70552 ms (enqueue 0.547681 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32837 ms - Host latency: 6.7009 ms (enqueue 0.6276 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35033 ms - Host latency: 6.71418 ms (enqueue 0.522705 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32811 ms - Host latency: 6.70371 ms (enqueue 0.668958 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34656 ms - Host latency: 6.7238 ms (enqueue 0.652075 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33059 ms - Host latency: 6.70889 ms (enqueue 0.655664 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32419 ms - Host latency: 6.70139 ms (enqueue 0.655005 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.44292 ms - Host latency: 6.82136 ms (enqueue 0.658057 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34622 ms - Host latency: 6.72185 ms (enqueue 0.677686 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32615 ms - Host latency: 6.70027 ms (enqueue 0.676904 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34031 ms - Host latency: 6.71704 ms (enqueue 0.662476 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33862 ms - Host latency: 6.70647 ms (enqueue 0.527612 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34087 ms - Host latency: 6.70713 ms (enqueue 0.525 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3314 ms - Host latency: 6.70781 ms (enqueue 0.659277 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32749 ms - Host latency: 6.70483 ms (enqueue 0.658228 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33015 ms - Host latency: 6.70642 ms (enqueue 0.652002 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32817 ms - Host latency: 6.70493 ms (enqueue 0.660205 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34448 ms - Host latency: 6.72043 ms (enqueue 0.659692 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.39229 ms - Host latency: 6.74646 ms (enqueue 0.525439 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32061 ms - Host latency: 6.6845 ms (enqueue 0.553271 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34688 ms - Host latency: 6.72087 ms (enqueue 0.660205 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33582 ms - Host latency: 6.71011 ms (enqueue 0.632886 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34663 ms - Host latency: 6.71179 ms (enqueue 0.473486 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3408 ms - Host latency: 6.71799 ms (enqueue 0.674658 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33108 ms - Host latency: 6.70874 ms (enqueue 0.667993 ms)
[02/07/2025-16:47:48] [I] 
[02/07/2025-16:47:48] [I] === Performance summary ===
[02/07/2025-16:47:48] [I] Throughput: 186.316 qps
[02/07/2025-16:47:48] [I] Latency: min = 6.60522 ms, max = 8.19635 ms, mean = 6.72611 ms, median = 6.70074 ms, percentile(90%) = 6.78662 ms, percentile(95%) = 6.81827 ms, percentile(99%) = 7.7793 ms
[02/07/2025-16:47:48] [I] Enqueue Time: min = 0.119629 ms, max = 0.78894 ms, mean = 0.575947 ms, median = 0.6073 ms, percentile(90%) = 0.673828 ms, percentile(95%) = 0.686035 ms, percentile(99%) = 0.720703 ms
[02/07/2025-16:47:48] [I] H2D Latency: min = 0.256927 ms, max = 1.21075 ms, mean = 0.285035 ms, median = 0.285126 ms, percentile(90%) = 0.292969 ms, percentile(95%) = 0.293701 ms, percentile(99%) = 0.295654 ms
[02/07/2025-16:47:48] [I] GPU Compute Time: min = 5.25818 ms, max = 6.83417 ms, mean = 5.35634 ms, median = 5.33044 ms, percentile(90%) = 5.41797 ms, percentile(95%) = 5.44971 ms, percentile(99%) = 6.36108 ms
[02/07/2025-16:47:48] [I] D2H Latency: min = 1.07739 ms, max = 1.08862 ms, mean = 1.08474 ms, median = 1.08521 ms, percentile(90%) = 1.08667 ms, percentile(95%) = 1.08704 ms, percentile(99%) = 1.08765 ms
[02/07/2025-16:47:48] [I] Total Host Walltime: 3.01639 s
[02/07/2025-16:47:48] [I] Total GPU Compute Time: 3.01026 s
[02/07/2025-16:47:48] [W] * GPU compute time is unstable, with coefficient of variance = 2.56954%.
[02/07/2025-16:47:48] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/07/2025-16:47:48] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/07/2025-16:47:48] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1

Any insights or suggestions on how to resolve the DW_DNN_INVALID_MODEL error would be greatly appreciated.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles