getPluginCreator could not find plugin: ResizeNearest

Dear @Morganh

I am trying to run inference on mask_rcnn model inside the nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy container but getting below issue.

root@ca83379e98f1:/home/data/Model-Training/BOX-SEGMENTATION_V1/mask_rcnn/experiment_dir_unpruned# python3 load_engine_infer_mask_rcnn.py 
[01/15/2025-07:00:34] [TRT] [E] 3: getPluginCreator could not find plugin: ResizeNearest_TRT version: 1
[01/15/2025-07:00:34] [TRT] [E] 1: [pluginV2Runner.cpp::load::303] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Traceback (most recent call last):
  File "/home/data/Model-Training/BOX-SEGMENTATION_V1/mask_rcnn/experiment_dir_unpruned/load_engine_infer_mask_rcnn.py", line 84, in <module>
    inputs, outputs, bindings, stream = allocate_buffers(engine)
  File "/home/data/Model-Training/BOX-SEGMENTATION_V1/mask_rcnn/experiment_dir_unpruned/load_engine_infer_mask_rcnn.py", line 20, in allocate_buffers
    for binding in engine:
TypeError: 'NoneType' object is not iterable

First I have converted uff file to engine using below command.

Execute the TAO-5.5 deploy container
docker run -d --gpus all -it --rm --shm-size=4g -v /home/smarg/Documents/TAO/:/home/data nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy /bin/bash
command for engine generation.

mask_rcnn gen_trt_engine -m ./model.epoch-6.uff \
                                     --batch_size 1 \
                                     --data_type fp16 \
                                     --engine_file ./model.epoch-6.uff.engine \
                                     --results_dir ./exportT



root@ca83379e98f1:/home/data/Model-Training/BOX-SEGMENTATION_V1/mask_rcnn/experiment_dir_unpruned# mask_rcnn gen_trt_engine -m ./model.epoch-6.uff \
                                     --batch_size 1 \
                                     --data_type fp16 \
                                     --engine_file ./model.epoch-6.uff.engine \
                                     --results_dir ./exportT
Loading uff directly from the package source code
Loading uff directly from the package source code
2025-01-15 06:40:15,915 [TAO Toolkit] [INFO] root 167: Starting mask_rcnn gen_trt_engine.
[01/15/2025-06:40:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 35, GPU 1019 (MiB)
[01/15/2025-06:40:20] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1453, GPU +268, now: CPU 1565, GPU 1287 (MiB)
2025-01-15 06:40:20,484 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.mask_rcnn.engine_builder 96: Parsing UFF model
[01/15/2025-06:40:20] [TRT] [W] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 150: TensorRT engine build configurations:
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 163:  
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 165:   BuilderFlag.FP16
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 179:   BuilderFlag.TF32
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 195:  
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 197:   Note: max representabile value is 2,147,483,648 bytes or 2GB.
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 199:   MemoryPoolType.WORKSPACE = 2147483648 bytes
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 201:   MemoryPoolType.DLA_MANAGED_SRAM = 0 bytes
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 203:   MemoryPoolType.DLA_LOCAL_DRAM = 1073741824 bytes
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 205:   MemoryPoolType.DLA_GLOBAL_DRAM = 536870912 bytes
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 207:  
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 209:   PreviewFeature.FASTER_DYNAMIC_SHAPES_0805
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 211:   PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805
2025-01-15 06:40:21,075 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 215:   Tactic Sources = 31
[01/15/2025-06:40:21] [TRT] [I] Graph optimization time: 0.0165025 seconds.
[01/15/2025-06:40:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 1793, GPU 1297 (MiB)
[01/15/2025-06:40:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1793, GPU 1297 (MiB)
[01/15/2025-06:40:21] [TRT] [W] cuDNN tactic soruce is always disabled in this TensorRT version
[01/15/2025-06:40:21] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/15/2025-06:42:07] [TRT] [I] Detected 1 inputs and 2 output network tensors.
[01/15/2025-06:42:07] [TRT] [I] Total Host Persistent Memory: 245568
[01/15/2025-06:42:07] [TRT] [I] Total Device Persistent Memory: 11776
[01/15/2025-06:42:07] [TRT] [I] Total Scratch Memory: 51951616
[01/15/2025-06:42:07] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 92 MiB, GPU 350 MiB
[01/15/2025-06:42:07] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 107 steps to complete.
[01/15/2025-06:42:07] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 9.0624ms to assign 21 blocks to 107 nodes requiring 86775296 bytes.
[01/15/2025-06:42:07] [TRT] [I] Total Activation Memory: 86773248
[01/15/2025-06:42:07] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2090, GPU 1359 (MiB)
[01/15/2025-06:42:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 2090, GPU 1359 (MiB)
[01/15/2025-06:42:07] [TRT] [W] cuDNN tactic soruce is always disabled in this TensorRT version
[01/15/2025-06:42:07] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/15/2025-06:42:07] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/15/2025-06:42:07] [TRT] [W] Check verbose logs for the list of affected weights.
[01/15/2025-06:42:07] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[01/15/2025-06:42:07] [TRT] [W] - 13 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[01/15/2025-06:42:07] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +51, GPU +48, now: CPU 51, GPU 48 (MiB)
Export finished successfully.
2025-01-15 06:42:07,650 [TAO Toolkit] [INFO] root 167: Gen_trt_engine finished successfully.
[2025-01-15 06:42:07,829 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Sending telemetry data.
[2025-01-15 06:42:07,829 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-01-15 06:42:07,829 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'gen_trt_engine', 'network': 'mask_rcnn', 'gpu': ['NVIDIA-RTX-A4000'], 'success': True, 'time_lapsed': 112.39582538604736} to https://api.tao.ngc.nvidia.com.
[2025-01-15 06:42:09,555 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2025-01-15 06:42:09,555 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2025-01-15 06:42:09,555 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Execution status: PASS

Executing the below script inside the same container but getting the issue mentioned above.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2

def load_engine(engine_file_path):
    """Load the TensorRT engine from file."""
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

def allocate_buffers(engine):
    """Allocate input and output buffers for TensorRT engine."""
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()

    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings
        bindings.append(int(device_mem))
        # Store the buffers
        if engine.binding_is_input(binding):
            inputs.append({"host": host_mem, "device": device_mem})
        else:
            outputs.append({"host": host_mem, "device": device_mem})
    return inputs, outputs, bindings, stream

def preprocess_image(image_path, input_shape):
    """Preprocess input image to match the engine's input size."""
    image = cv2.imread(image_path)
    original_image = image.copy()
    resized_image = cv2.resize(image, (input_shape[2], input_shape[1]))
    normalized_image = resized_image.astype(np.float32) / 255.0  # Normalize to [0, 1]
    transposed_image = np.transpose(normalized_image, (2, 0, 1))  # HWC to CHW
    batch_image = np.expand_dims(transposed_image, axis=0)  # Add batch dimension
    return batch_image, original_image  # Return original for visualization

def do_inference(engine, inputs, outputs, bindings, stream, input_image):
    """Run inference on the TensorRT engine."""
    # Copy input data to the input buffer
    np.copyto(inputs[0]["host"], input_image.ravel())
    # Transfer input data to the GPU
    cuda.memcpy_htod_async(inputs[0]["device"], inputs[0]["host"], stream)
    # Run inference
    context = engine.create_execution_context()
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back to the host
    cuda.memcpy_dtoh_async(outputs[0]["host"], outputs[0]["device"], stream)
    stream.synchronize()
    return outputs[0]["host"]

def postprocess_output(output, image, threshold=0.5):
    """Post-process the output to overlay detections on the image."""
    # Parse the output (example assumes output contains boxes, scores, and masks)
    # You must modify this part based on your model's output structure.
    boxes, scores, masks = output[0], output[1], output[2]  # Adjust as needed
    for i, score in enumerate(scores):
        if score > threshold:
            box = boxes[i]
            x1, y1, x2, y2 = map(int, box)
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
            # Overlay mask on the image
            mask = masks[i]
            mask = (mask > threshold).astype(np.uint8)
            colored_mask = np.zeros_like(image, dtype=np.uint8)
            colored_mask[:, :, 1] = mask * 255
            image = cv2.addWeighted(image, 1, colored_mask, 0.5, 0)
    return image

if __name__ == "__main__":
    engine_file = "model.epoch-6.uff.engine"  # Path to your TensorRT engine
    image_path = "net-5809-_jpg.rf.4e20462228dd67b33cbbda88966dbbae.jpg"          # Path to the input image
    input_shape = (1, 3, 640, 640)   # Batch size 1, 3 channels, 640x640 resolution

    # Load TensorRT engine
    engine = load_engine(engine_file)
    inputs, outputs, bindings, stream = allocate_buffers(engine)

    # Preprocess input image
    input_image, original_image = preprocess_image(image_path, input_shape)

    # Run inference
    output = do_inference(engine, inputs, outputs, bindings, stream, input_image)

    # Post-process and visualize the result
    result_image = postprocess_output(output, original_image)
    cv2.imshow("Result", result_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

Please suggest where are the gaps or is there any other infer script u can provide for mask_rcnn model infer.

1 post - 1 participant

Read full topic

getPluginCreator could not find plugin: ResizeNearest_TRT version: 1

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112