Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

Issue Running Inference on NVIDIA TAO Retail Object Recognition Model

$
0
0

Hello,

I’m trying to evaluate the Retail Object Recognition model from NVIDIA TAO to see if it fits my needs. My goal is to run inference using the pretrained model, but I’ve encountered issues along the way. My current hardware used is 1 NVIDIA A100

Steps Taken:

I followed the official tutorial from NVIDIA: Retail Object Recognition Notebook

However, the notebook is primarily focused on transfer learning, and I couldn’t find clear instructions on how to directly test the pretrained model.

I downloaded the model using:

!ngc registry model download-version nvidia/tao/retail_object_recognition:trainable_head_fan_base_v2.0 --dest $HOST_MODEL_DIR/

I modified the infer.yaml file as follows:

results_dir: "???"
model:
  backbone: **fan_base**
  input_width: 224
  input_height: 224
  feat_dim: 1024
dataset:
  workers: 8
  val_dataset:
    reference: "???"
    query: ""
inference:
  inference_input_type: classification_folder
  input_path: "???"
  batch_size: 16

I attempted to run inference with:

# run inference on known classes
! tao model ml_recog inference \
                    -e $SPECS_DIR/infer.yaml \
                    results_dir=$RESULTS_DIR \
                    inference.checkpoint=$MODEL_DIR/retail_object_recognition_vtrainable_head_fan_base_v2.0/retail_object_recognition_head_fan_base_v2.0.pth \
                    dataset.val_dataset.reference=$DATA_DIR/$DATA_FOLDER/known_classes/reference \
                    inference.input_path=$DATA_DIR/$DATA_FOLDER/known_classes/test 

Encountered Errors:

I received the following error:
KeyError: ‘pytorch-lightning_version’

It seems that the checkpoint file lacks the required pytorch-lightning_version key. I attempted to manually modify the checkpoint by loading it in PyTorch and adding:

new_ckpt['pytorch-lightning_version'] = '0.0.0'
new_ckpt['global_step'] = None
new_ckpt['epoch'] = None

However, this did not resolve the issue. The model’s state_dict is missing several keys, and adding them manually does not work. The error I receive is:

RuntimeError: Error(s) in loading state_dict for MLRecogModel:
Missing key(s) in state_dict: "model.embedder.classifier_feat.0.weight", "model.embedder.classifier_feat.0.bias", ... etc...

Is this model intended only for fine-tuning, or should it work for direct inference? I do not want to train, I just want to see if it fits my needs and then perform a finetune.

Thank you in advance.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles