Description
The current release page for TAO deploy offers guidance that conflicts with the norm. It states:
Due to memory issues, you should first run the
gen_trt_engine
subtask on the x86 platform to generate the engine; you can then use the generated engine to run inference or evaluation on the Jetson platform and with the target dataset.
This is in contrast to the guidance given in most other sources:
By default, TensorRT engines are only compatible with the type of device where they were built. With build-time configuration, engines can be built that are compatible with other types of devices. Currently, hardware compatibility is supported only for Ampere and later device architectures and is not supported on NVIDIA DRIVE OS or JetPack.
Question
Does the L4T TensorRT docker container default to hardware compat mode?
By default, TensorRT engines are only compatible with the type of device where they were built. With build-time configuration, engines that are compatible with other types of devices can be built. Currently, hardware compatibility is supported only for Ampere and later device architectures and is not supported on NVIDIA DRIVE OS or JetPack.
When building in hardware compatibility mode, TensorRT excludes tactics that are not hardware compatible, such as those that use architecture-specific instructions or require more shared memory than is available on some devices. Thus, a hardware-compatible engine may have lower throughput and/or higher latency than its non-hardware-compatible counterpart. The degree of this performance impact depends on the network architecture and input sizes.
2 posts - 2 participants