I am trying to train a custom ocr model with the following command
!tao model ocrnet train -e /specs/experiment-vit.yaml \
train.results_dir=$RESULTS_DIR/experiment_dir_unpruned \
train.pretrained_model_path=$RESULTS_DIR/pretrained_ocrnet/ocrnet_vtrainable_v2.0/ocrnet-vit.pth \
dataset.train_dataset_dir=[$DATA_DIR/train] \
dataset.val_dataset_dir=$DATA_DIR/val \
dataset.character_list_file=$DATA_DIR/characters_list
My experiment-vit.yaml file looks like this
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: FAN_tiny_2X
sequence: BiLSTM
hidden_size: 256
prediction: Attn
quantize: False
input_width: 200
input_height: 64
input_channel: 1
dataset:
train_dataset_dir: [/data/train]
val_dataset_dir: /data/val
character_list_file: /data/characters_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 0.1
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: "amount"
amount: 0.1
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
But running the command is giving error
2023-12-19 20:25:29,652 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-12-19 20:25:29,690 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2023-12-19 20:25:29,754 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 262:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/sigmind/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-12-19 20:25:29,754 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
[2023-12-19 14:25:31,875 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Created a temporary directory at /tmp/tmpjn5hsq2o
[2023-12-19 14:25:31,875 - TAO Toolkit - torch.distributed.nn.jit.instantiator - INFO] Writing /tmp/tmpjn5hsq2o/_remote_module_non_scriptable.py
<frozen importlib._bootstrap>:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
<frozen importlib._bootstrap>:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
sys:1: UserWarning:
'experiment-vit.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning:
'experiment-vit.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
<frozen core.loggers.api_logging>:245: UserWarning: Log file already exists at /results/experiment_dir_unpruned/status.json
No FeatureExtraction module specified
Error executing job with overrides: ['train.results_dir=/results/experiment_dir_unpruned', 'train.pretrained_model_path=/mnt/sdc1/OCRnet/ocrnet/pretrained_ocrnet/ocrnet_vtrainable_v2.0/ocrnet-vit.pth', 'dataset.train_dataset_dir=[/data/train]', 'dataset.val_dataset_dir=/data/val', 'dataset.character_list_file=/data/characters_list']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
assert mdl is not None
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py>", line 3, in <module>
File "<frozen cv.ocrnet.scripts.train>", line 136, in <module>
File "<frozen core.hydra.hydra_runner>", line 107, in wrapper
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
raise ex
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "<frozen cv.ocrnet.scripts.train>", line 132, in main
File "<frozen cv.ocrnet.scripts.train>", line 121, in main
File "<frozen cv.ocrnet.scripts.train>", line 60, in run_experiment
File "<frozen cv.ocrnet.model.pl_ocrnet>", line 67, in __init__
File "<frozen cv.ocrnet.model.pl_ocrnet>", line 81, in _build_model
File "<frozen cv.ocrnet.model.build_nn_model>", line 147, in build_ocrnet_model
File "<frozen cv.ocrnet.model.model>", line 59, in __init__
Exception: No FeatureExtraction module specified
2023-12-19 20:25:35,782 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.
I don’t understand what FeatureExtraction model it is looking for. My tao version is
Configuration of the TAO Toolkit Instance
task_group: ['model', 'dataset', 'deploy']
format_version: 3.0
toolkit_version: 5.0.0
published_date: 07/14/2023
2 posts - 2 participants