• Hardware T4
• Network Type OCRNet
• TLT Version 8.6.1.6
• Training spec file: From sample notebook
I’m trying to use a re-trained OCRNet from a Python script with an OpenCV mat as input. The training was done using an unchanged ocrnet-vit.ipynb
from the tao_launcher_toolkit
tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet-vit.ipynb at main · NVIDIA/tao_tutorials · GitHub
I was able to run the inference.py
script from the tao_deploy
sample code tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts at main · NVIDIA/tao_deploy · GitHub
Now, this sample code deals with files, I need to run it with OpenCV mats from inside my inference script.
So I tried to put the less I understood from the entire tao_deploy
kit into a new script. This script is loading the re-trained model, reads an image from disk and tries to feed that to the model. But it fails. When running the inference, tons of
[06/11/2024-19:25:44] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
I wasn’t able to see any other sample code on the web, also unable to find any other documentation.
My question is: Would somebody be able to help me to find the problem with this small sample?
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
# Load ensorRT-Engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open("trt.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
# Context and stream creation
context = engine.create_execution_context()
stream = cuda.Stream()
# Bild laden und in Graustufen konvertieren
image_path = "/home/ubuntu/images/BEH6242.png"
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
# Load image, scale it
max_width = 200
max_height = 64
image_resized = cv2.resize(image, (max_width, max_height), interpolation=cv2.INTER_AREA)
# To numpy array
input_data = np.array(image_resized, dtype=np.float32)
# Extract input/output shapes
input_shape = engine.get_binding_shape(0)
output_shape = engine.get_binding_shape(1)
print("input_shape", input_shape)
print("output_shape", output_shape)
# Allocate cuda memory
d_input = cuda.mem_alloc(int(1 * np.prod(input_shape) * np.float32().nbytes))
d_output = cuda.mem_alloc(int(1 * np.prod(output_shape) * np.float32().nbytes))
# Copy host to cuda
cuda.memcpy_htod_async(d_input, input_data, stream)
# Run the inference
context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
# GPU to host
h_output = np.empty(output_shape, dtype=np.float32)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()
I have put this code into test.py
and running it on an AWS T4 instance with python3 test.py
.
The output is:
input_shape (1, 1, 64, 200)
output_shape (1, 26)
Traceback (most recent call last):
File "/home/ubuntu/OpenCV-dnn-samples/test.py", line 47, in <module>
cuda.memcpy_dtoh_async(h_output, d_output, stream)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered
Then a lot of messages follow:
[06/12/2024-05:02:16] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (Error 700 destroying stream '0x5b3cfb2d7010'.)
If I comment the context.execute_async_v2
there is no problem with the rest of the code.
I know this code might be wrong in many ways, so please forgive me please if it is complete nonsense. :)
4 posts - 2 participants