I am trying to train a vehicle classification model with TAO Classification pytorch.
• Hardware ( RTX 2060)
• Network Type (Classification pyt)
• TLT Version ( nvidia/tao/tao-toolkit: 5.5.0-pyt: )
• Training spec file:
train:
exp_config:
manual_seed: 42
train_config:
optimizer:
type: AdamW
lr: 0.001 # Base learning rate
weight_decay: 0.01
lr_config:
type: "CosineAnnealingLR"
T_max: 50 # Cosine schedule period (in epochs)
eta_min: 1e-5
by_epoch: True
runner:
max_epochs: 50 # Train for 50 epochs
checkpoint_config:
interval: 1 # Save a checkpoint every epoch
logging:
interval: 100 # Log metrics every 100 iterations
validate: True
evaluation:
interval: 2 # Evaluate on the validation set every 2 epochs
custom_hooks:
- type: "EMAHook"
momentum: 4e-5
priority: "ABOVE_NORMAL"
#find_unused_parameters: True
dataset:
data:
samples_per_gpu: 8 # Batch size per GPU (suitable for RTX 4060 8GB)
train:
data_prefix: "/data/train/"
pipeline:
- type: RandomResizedCrop
scale: 224
backend: pillow
- type: RandomFlip
prob: 0.5
direction: "horizontal"
- type: ColorJitter
brightness: 0.4
contrast: 0.4
saturation: 0.4
hue: 0.1
- type: RandomErasing
erase_prob: 0.3
classes: "/data/label_cda.txt"
val:
data_prefix: "/data/val/"
classes: "/data/label_cda.txt"
test:
data_prefix: "/data/val/"
classes: "/data/label_cda.txt"
model:
backbone:
type: "faster_vit_4_21k_224"
custom_args:
drop_path: 0.1 # Stochastic depth for regularization
head:
type: "TAOLinearClsHead"
custom_args:
head_init_scale: 1.0
num_classes: 11 # Number of classes
loss:
type: "CrossEntropyLoss"
loss_weight: 1.0
class_weight: [0.605, 0.726, 62.6, 2.31, 1.23, 2.15, 0.403, 1.29, 0.935, 1.184, 0.927] # Class weights
use_soft: False
I have class imbalance thus I have introduced class_weight
parameters in the loss function.
When running the training with the command:
!tao model classification_pyt train \
-e $SPECS_DIR/train_CDA.yaml \
results_dir=$RESULTS_DIR/classification_experiment \
train.num_gpus=$NUM_GPUS \
model.init_cfg.checkpoint=/workspace/tao-experiments/pretrained/fastervit_4_21k_224_w14.pth
I see error:
env: EPOCHS=50
Train Classification Model
2025-03-06 23:49:41,054 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-03-06 23:49:41,105 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-03-06 23:49:41,124 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 293:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/sigmind/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2025-03-06 23:49:41,124 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2025-03-06 17:49:45,137 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
Train results will be saved at: /results/classification_experiment/train
03/06 17:49:52 - mmengine - INFO -
------------------------------------------------------------
System environment:
sys.platform: linux
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] CUDA available: True
MUSA available: False
numpy_random_seed: 42
GPU 0: NVIDIA GeForce RTX 2060
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.0a0+6ddf5cf85e.nv24.04
PyTorch compiling details: PyTorch built with:
- GCC 11.2
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.3.2 (Git Hash N/A)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.4
- NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_87,code=sm_87;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
- CuDNN 90.1
- Magma 2.6.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.18.0a0
OpenCV: 4.7.0
MMEngine: 0.10.4
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 42
deterministic: False
Distributed launcher: pytorch
Distributed training: True
GPU number: 1
------------------------------------------------------------
03/06 17:49:52 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=1024)
custom_hooks = [
dict(momentum=4e-05, priority='ABOVE_NORMAL', type='EMAHook'),
]
data_preprocessor = dict(
mean=[
123.675,
116.28,
103.53,
],
num_classes=11,
std=[
58.395,
57.12,
57.375,
],
to_rgb=True)
dataset_type = 'ImageNet'
default_hooks = dict(
checkpoint=dict(interval=1, type='CheckpointHook'),
logger=dict(interval=100, type='TaoTextLoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'),
visualization=dict(enable=False, type='VisualizationHook'))
default_scope = 'mmpretrain'
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
find_unused_parameters = False
launcher = 'pytorch'
load_from = None
log_level = 'INFO'
model = dict(
backbone=dict(
drop_path=0.1,
freeze=False,
init_cfg=dict(
checkpoint=
'/workspace/tao-experiments/pretrained/fastervit_4_21k_224_w14.pth',
prefix=None,
type='Pretrained'),
pretrained=None,
type='faster_vit_4_21k_224'),
head=dict(
binary=False,
head_init_scale=1.0,
in_channels=1568,
loss=dict(
class_weight=[
0.605,
0.726,
62.6,
2.31,
1.23,
2.15,
0.403,
1.29,
0.935,
1.184,
0.927,
],
loss_weight=1.0,
type='CrossEntropyLoss',
use_soft=False),
num_classes=11,
type='TAOLinearClsHead'),
neck=None,
train_cfg=dict(augments=None),
type='ImageClassifier')
optim_wrapper = dict(
optimizer=dict(lr=0.001, type='AdamW', weight_decay=0.01),
paramwise_cfg=None)
param_scheduler = [
dict(T_max=50, by_epoch=True, eta_min=1e-05, type='CosineAnnealingLR'),
]
randomness = dict(deterministic=False, seed=42)
resume = False
test_cfg = dict()
test_dataloader = dict(
batch_size=8,
collate_fn=dict(type='default_collate'),
dataset=dict(
ann_file=None,
classes='/data/label_cda.txt',
data_prefix='/data/val/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=224, type='Resize'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='ImageNet'),
num_workers=2,
pin_memory=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
test_evaluator = dict(topk=(1, ), type='Accuracy')
train_cfg = dict(by_epoch=True, max_epochs=50, val_interval=2)
train_dataloader = dict(
batch_size=8,
collate_fn=dict(type='default_collate'),
dataset=dict(
classes='/data/label_cda.txt',
data_prefix='/data/train/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(backend='pillow', scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(
brightness=0.4,
contrast=0.4,
hue=0.1,
saturation=0.4,
type='ColorJitter'),
dict(erase_prob=0.3, type='RandomErasing'),
dict(type='PackInputs'),
],
type='ImageNet'),
num_workers=2,
pin_memory=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
val_cfg = dict()
val_dataloader = dict(
batch_size=8,
collate_fn=dict(type='default_collate'),
dataset=dict(
ann_file=None,
classes='/data/label_cda.txt',
data_prefix='/data/val/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=224, type='Resize'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='ImageNet'),
num_workers=2,
pin_memory=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
val_evaluator = dict(topk=(1, ), type='Accuracy')
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
type='UniversalVisualizer', vis_backends=[
dict(type='LocalVisBackend'),
])
work_dir = '/results/classification_experiment/train'
03/06 17:49:52 - mmengine - INFO - Because batch augmentations are enabled, the data preprocessor automatically enables the `to_onehot` option to generate one-hot format labels.
03/06 17:49:56 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(ABOVE_NORMAL) EMAHook
(BELOW_NORMAL) TaoTextLoggerHook
--------------------
after_load_checkpoint:
(ABOVE_NORMAL) EMAHook
--------------------
before_train:
(VERY_HIGH ) RuntimeInfoHook
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
--------------------
before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
--------------------
before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
--------------------
after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) TaoTextLoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
before_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_val_epoch:
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
--------------------
before_val_iter:
(NORMAL ) IterTimerHook
--------------------
after_val_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) TaoTextLoggerHook
--------------------
after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) TaoTextLoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_save_checkpoint:
(ABOVE_NORMAL) EMAHook
--------------------
after_train:
(VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook
--------------------
before_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_test_epoch:
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
--------------------
before_test_iter:
(NORMAL ) IterTimerHook
--------------------
after_test_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) TaoTextLoggerHook
--------------------
after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(ABOVE_NORMAL) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) TaoTextLoggerHook
--------------------
after_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_run:
(BELOW_NORMAL) TaoTextLoggerHook
--------------------
03/06 17:49:59 - mmengine - INFO - load model from: /workspace/tao-experiments/pretrained/fastervit_4_21k_224_w14.pth
03/06 17:49:59 - mmengine - INFO - Loads checkpoint by local backend from path: /workspace/tao-experiments/pretrained/fastervit_4_21k_224_w14.pth
03/06 17:49:59 - mmengine - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: backbone.patch_embed.conv_down.0.weight, backbone.patch_embed.conv_down.1.weight, backbone.patch_embed.conv_down.1.bias, backbone.patch_embed.conv_down.1.running_mean, backbone.patch_embed.conv_down.1.running_var, backbone.patch_embed.conv_down.1.num_batches_tracked, backbone.patch_embed.conv_down.3.weight, backbone.patch_embed.conv_down.4.weight, backbone.patch_embed.conv_down.4.bias, backbone.patch_embed.conv_down.4.running_mean, backbone.patch_embed.conv_down.4.running_var, backbone.patch_embed.conv_down.4.num_batches_tracked, backbone.levels.0.blocks.0.conv1.weight, backbone.levels.0.blocks.0.conv1.bias, backbone.levels.0.blocks.0.norm1.weight, backbone.levels.0.blocks.0.norm1.bias, backbone.levels.0.blocks.0.norm1.running_mean, backbone.levels.0.blocks.0.norm1.running_var, backbone.levels.0.blocks.0.norm1.num_batches_tracked, backbone.levels.0.blocks.0.conv2.weight, backbone.levels.0.blocks.0.conv2.bias, backbone.levels.0.blocks.0.norm2.weight, backbone.levels.0.blocks.0.norm2.bias, backbone.levels.0.blocks.0.norm2.running_mean, backbone.levels.0.blocks.0.norm2.running_var, backbone.levels.0.blocks.0.norm2.num_batches_tracked, backbone.levels.0.blocks.1.conv1.weight, backbone.levels.0.blocks.1.conv1.bias, backbone.levels.0.blocks.1.norm1.weight, backbone.levels.0.blocks.1.norm1.bias, backbone.levels.0.blocks.1.norm1.running_mean, backbone.levels.0.blocks.1.norm1.running_var, backbone.levels.0.blocks.1.norm1.num_batches_tracked, backbone.levels.0.blocks.1.conv2.weight, backbone.levels.0.blocks.1.conv2.bias, backbone.levels.0.blocks.1.norm2.weight, backbone.levels.0.blocks.1.norm2.bias, backbone.levels.0.blocks.1.norm2.running_mean, backbone.levels.0.blocks.1.norm2.running_var, backbone.levels.0.blocks.1.norm2.num_batches_tracked, backbone.levels.0.blocks.2.conv1.weight, backbone.levels.0.blocks.2.conv1.bias, backbone.levels.0.blocks.2.norm1.weight, backbone.levels.0.blocks.2.norm1.bias, backbone.levels.0.blocks.2.norm1.running_mean, backbone.levels.0.blocks.2.norm1.running_var, backbone.levels.0.blocks.2.norm1.num_batches_tracked, backbone.levels.0.blocks.2.conv2.weight, backbone.levels.0.blocks.2.conv2.bias, backbone.levels.0.blocks.2.norm2.weight, backbone.levels.0.blocks.2.norm2.bias, backbone.levels.0.blocks.2.norm2.running_mean, backbone.levels.0.blocks.2.norm2.running_var, backbone.levels.0.blocks.2.norm2.num_batches_tracked, backbone.levels.0.downsample.norm.weight, backbone.levels.0.downsample.norm.bias, backbone.levels.0.downsample.reduction.0.weight, backbone.levels.1.blocks.0.conv1.weight, backbone.levels.1.blocks.0.conv1.bias, backbone.levels.1.blocks.0.norm1.weight, backbone.levels.1.blocks.0.norm1.bias, backbone.levels.1.blocks.0.norm1.running_mean, backbone.levels.1.blocks.0.norm1.running_var, backbone.levels.1.blocks.0.norm1.num_batches_tracked, backbone.levels.1.blocks.0.conv2.weight, backbone.levels.1.blocks.0.conv2.bias, backbone.levels.1.blocks.0.norm2.weight, backbone.levels.1.blocks.0.norm2.bias, backbone.levels.1.blocks.0.norm2.running_mean, backbone.levels.1.blocks.0.norm2.running_var, backbone.levels.1.blocks.0.norm2.num_batches_tracked, backbone.levels.1.blocks.1.conv1.weight, backbone.levels.1.blocks.1.conv1.bias, backbone.levels.1.blocks.1.norm1.weight, backbone.levels.1.blocks.1.norm1.bias, backbone.levels.1.blocks.1.norm1.running_mean, backbone.levels.1.blocks.1.norm1.running_var, backbone.levels.1.blocks.1.norm1.num_batches_tracked, backbone.levels.1.blocks.1.conv2.weight, backbone.levels.1.blocks.1.conv2.bias, backbone.levels.1.blocks.1.norm2.weight, backbone.levels.1.blocks.1.norm2.bias, backbone.levels.1.blocks.1.norm2.running_mean, backbone.levels.1.blocks.1.norm2.running_var, backbone.levels.1.blocks.1.norm2.num_batches_tracked, backbone.levels.1.blocks.2.conv1.weight, backbone.levels.1.blocks.2.conv1.bias, backbone.levels.1.blocks.2.norm1.weight, backbone.levels.1.blocks.2.norm1.bias, backbone.levels.1.blocks.2.norm1.running_mean, backbone.levels.1.blocks.2.norm1.running_var, backbone.levels.1.blocks.2.norm1.num_batches_tracked, backbone.levels.1.blocks.2.conv2.weight, backbone.levels.1.blocks.2.conv2.bias, backbone.levels.1.blocks.2.norm2.weight, backbone.levels.1.blocks.2.norm2.bias, backbone.levels.1.blocks.2.norm2.running_mean, backbone.levels.1.blocks.2.norm2.running_var, backbone.levels.1.blocks.2.norm2.num_batches_tracked, backbone.levels.1.downsample.norm.weight, backbone.levels.1.downsample.norm.bias, backbone.levels.1.downsample.reduction.0.weight, backbone.levels.2.blocks.0.gamma3, backbone.levels.2.blocks.0.gamma4, backbone.levels.2.blocks.0.pos_embed.relative_bias, backbone.levels.2.blocks.0.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.0.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.0.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.0.norm1.weight, backbone.levels.2.blocks.0.norm1.bias, backbone.levels.2.blocks.0.attn.qkv.weight, backbone.levels.2.blocks.0.attn.qkv.bias, backbone.levels.2.blocks.0.attn.proj.weight, backbone.levels.2.blocks.0.attn.proj.bias, backbone.levels.2.blocks.0.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.0.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.0.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.0.norm2.weight, backbone.levels.2.blocks.0.norm2.bias, backbone.levels.2.blocks.0.mlp.fc1.weight, backbone.levels.2.blocks.0.mlp.fc1.bias, backbone.levels.2.blocks.0.mlp.fc2.weight, backbone.levels.2.blocks.0.mlp.fc2.bias, backbone.levels.2.blocks.1.gamma3, backbone.levels.2.blocks.1.gamma4, backbone.levels.2.blocks.1.pos_embed.relative_bias, backbone.levels.2.blocks.1.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.1.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.1.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.1.norm1.weight, backbone.levels.2.blocks.1.norm1.bias, backbone.levels.2.blocks.1.attn.qkv.weight, backbone.levels.2.blocks.1.attn.qkv.bias, backbone.levels.2.blocks.1.attn.proj.weight, backbone.levels.2.blocks.1.attn.proj.bias, backbone.levels.2.blocks.1.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.1.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.1.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.1.norm2.weight, backbone.levels.2.blocks.1.norm2.bias, backbone.levels.2.blocks.1.mlp.fc1.weight, backbone.levels.2.blocks.1.mlp.fc1.bias, backbone.levels.2.blocks.1.mlp.fc2.weight, backbone.levels.2.blocks.1.mlp.fc2.bias, backbone.levels.2.blocks.2.gamma3, backbone.levels.2.blocks.2.gamma4, backbone.levels.2.blocks.2.pos_embed.relative_bias, backbone.levels.2.blocks.2.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.2.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.2.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.2.norm1.weight, backbone.levels.2.blocks.2.norm1.bias, backbone.levels.2.blocks.2.attn.qkv.weight, backbone.levels.2.blocks.2.attn.qkv.bias, backbone.levels.2.blocks.2.attn.proj.weight, backbone.levels.2.blocks.2.attn.proj.bias, backbone.levels.2.blocks.2.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.2.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.2.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.2.norm2.weight, backbone.levels.2.blocks.2.norm2.bias, backbone.levels.2.blocks.2.mlp.fc1.weight, backbone.levels.2.blocks.2.mlp.fc1.bias, backbone.levels.2.blocks.2.mlp.fc2.weight, backbone.levels.2.blocks.2.mlp.fc2.bias, backbone.levels.2.blocks.3.gamma3, backbone.levels.2.blocks.3.gamma4, backbone.levels.2.blocks.3.pos_embed.relative_bias, backbone.levels.2.blocks.3.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.3.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.3.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.3.norm1.weight, backbone.levels.2.blocks.3.norm1.bias, backbone.levels.2.blocks.3.attn.qkv.weight, backbone.levels.2.blocks.3.attn.qkv.bias, backbone.levels.2.blocks.3.attn.proj.weight, backbone.levels.2.blocks.3.attn.proj.bias, backbone.levels.2.blocks.3.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.3.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.3.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.3.norm2.weight, backbone.levels.2.blocks.3.norm2.bias, backbone.levels.2.blocks.3.mlp.fc1.weight, backbone.levels.2.blocks.3.mlp.fc1.bias, backbone.levels.2.blocks.3.mlp.fc2.weight, backbone.levels.2.blocks.3.mlp.fc2.bias, backbone.levels.2.blocks.4.gamma3, backbone.levels.2.blocks.4.gamma4, backbone.levels.2.blocks.4.pos_embed.relative_bias, backbone.levels.2.blocks.4.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.4.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.4.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.4.norm1.weight, backbone.levels.2.blocks.4.norm1.bias, backbone.levels.2.blocks.4.attn.qkv.weight, backbone.levels.2.blocks.4.attn.qkv.bias, backbone.levels.2.blocks.4.attn.proj.weight, backbone.levels.2.blocks.4.attn.proj.bias, backbone.levels.2.blocks.4.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.4.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.4.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.4.norm2.weight, backbone.levels.2.blocks.4.norm2.bias, backbone.levels.2.blocks.4.mlp.fc1.weight, backbone.levels.2.blocks.4.mlp.fc1.bias, backbone.levels.2.blocks.4.mlp.fc2.weight, backbone.levels.2.blocks.4.mlp.fc2.bias, backbone.levels.2.blocks.5.gamma3, backbone.levels.2.blocks.5.gamma4, backbone.levels.2.blocks.5.pos_embed.relative_bias, backbone.levels.2.blocks.5.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.5.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.5.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.5.norm1.weight, backbone.levels.2.blocks.5.norm1.bias, backbone.levels.2.blocks.5.attn.qkv.weight, backbone.levels.2.blocks.5.attn.qkv.bias, backbone.levels.2.blocks.5.attn.proj.weight, backbone.levels.2.blocks.5.attn.proj.bias, backbone.levels.2.blocks.5.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.5.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.5.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.5.norm2.weight, backbone.levels.2.blocks.5.norm2.bias, backbone.levels.2.blocks.5.mlp.fc1.weight, backbone.levels.2.blocks.5.mlp.fc1.bias, backbone.levels.2.blocks.5.mlp.fc2.weight, backbone.levels.2.blocks.5.mlp.fc2.bias, backbone.levels.2.blocks.6.gamma3, backbone.levels.2.blocks.6.gamma4, backbone.levels.2.blocks.6.pos_embed.relative_bias, backbone.levels.2.blocks.6.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.6.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.6.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.6.norm1.weight, backbone.levels.2.blocks.6.norm1.bias, backbone.levels.2.blocks.6.attn.qkv.weight, backbone.levels.2.blocks.6.attn.qkv.bias, backbone.levels.2.blocks.6.attn.proj.weight, backbone.levels.2.blocks.6.attn.proj.bias, backbone.levels.2.blocks.6.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.6.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.6.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.6.norm2.weight, backbone.levels.2.blocks.6.norm2.bias, backbone.levels.2.blocks.6.mlp.fc1.weight, backbone.levels.2.blocks.6.mlp.fc1.bias, backbone.levels.2.blocks.6.mlp.fc2.weight, backbone.levels.2.blocks.6.mlp.fc2.bias, backbone.levels.2.blocks.7.gamma3, backbone.levels.2.blocks.7.gamma4, backbone.levels.2.blocks.7.pos_embed.relative_bias, backbone.levels.2.blocks.7.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.7.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.7.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.7.norm1.weight, backbone.levels.2.blocks.7.norm1.bias, backbone.levels.2.blocks.7.attn.qkv.weight, backbone.levels.2.blocks.7.attn.qkv.bias, backbone.levels.2.blocks.7.attn.proj.weight, backbone.levels.2.blocks.7.attn.proj.bias, backbone.levels.2.blocks.7.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.7.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.7.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.7.norm2.weight, backbone.levels.2.blocks.7.norm2.bias, backbone.levels.2.blocks.7.mlp.fc1.weight, backbone.levels.2.blocks.7.mlp.fc1.bias, backbone.levels.2.blocks.7.mlp.fc2.weight, backbone.levels.2.blocks.7.mlp.fc2.bias, backbone.levels.2.blocks.8.gamma3, backbone.levels.2.blocks.8.gamma4, backbone.levels.2.blocks.8.pos_embed.relative_bias, backbone.levels.2.blocks.8.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.8.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.8.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.8.norm1.weight, backbone.levels.2.blocks.8.norm1.bias, backbone.levels.2.blocks.8.attn.qkv.weight, backbone.levels.2.blocks.8.attn.qkv.bias, backbone.levels.2.blocks.8.attn.proj.weight, backbone.levels.2.blocks.8.attn.proj.bias, backbone.levels.2.blocks.8.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.8.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.8.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.8.norm2.weight, backbone.levels.2.blocks.8.norm2.bias, backbone.levels.2.blocks.8.mlp.fc1.weight, backbone.levels.2.blocks.8.mlp.fc1.bias, backbone.levels.2.blocks.8.mlp.fc2.weight, backbone.levels.2.blocks.8.mlp.fc2.bias, backbone.levels.2.blocks.9.gamma3, backbone.levels.2.blocks.9.gamma4, backbone.levels.2.blocks.9.pos_embed.relative_bias, backbone.levels.2.blocks.9.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.9.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.9.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.9.norm1.weight, backbone.levels.2.blocks.9.norm1.bias, backbone.levels.2.blocks.9.attn.qkv.weight, backbone.levels.2.blocks.9.attn.qkv.bias, backbone.levels.2.blocks.9.attn.proj.weight, backbone.levels.2.blocks.9.attn.proj.bias, backbone.levels.2.blocks.9.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.9.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.9.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.9.norm2.weight, backbone.levels.2.blocks.9.norm2.bias, backbone.levels.2.blocks.9.mlp.fc1.weight, backbone.levels.2.blocks.9.mlp.fc1.bias, backbone.levels.2.blocks.9.mlp.fc2.weight, backbone.levels.2.blocks.9.mlp.fc2.bias, backbone.levels.2.blocks.10.gamma3, backbone.levels.2.blocks.10.gamma4, backbone.levels.2.blocks.10.pos_embed.relative_bias, backbone.levels.2.blocks.10.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.10.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.10.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.10.norm1.weight, backbone.levels.2.blocks.10.norm1.bias, backbone.levels.2.blocks.10.attn.qkv.weight, backbone.levels.2.blocks.10.attn.qkv.bias, backbone.levels.2.blocks.10.attn.proj.weight, backbone.levels.2.blocks.10.attn.proj.bias, backbone.levels.2.blocks.10.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.10.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.10.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.10.norm2.weight, backbone.levels.2.blocks.10.norm2.bias, backbone.levels.2.blocks.10.mlp.fc1.weight, backbone.levels.2.blocks.10.mlp.fc1.bias, backbone.levels.2.blocks.10.mlp.fc2.weight, backbone.levels.2.blocks.10.mlp.fc2.bias, backbone.levels.2.blocks.11.gamma3, backbone.levels.2.blocks.11.gamma4, backbone.levels.2.blocks.11.pos_embed.relative_bias, backbone.levels.2.blocks.11.pos_embed.cpb_mlp.0.weight, backbone.levels.2.blocks.11.pos_embed.cpb_mlp.0.bias, backbone.levels.2.blocks.11.pos_embed.cpb_mlp.2.weight, backbone.levels.2.blocks.11.norm1.weight, backbone.levels.2.blocks.11.norm1.bias, backbone.levels.2.blocks.11.attn.qkv.weight, backbone.levels.2.blocks.11.attn.qkv.bias, backbone.levels.2.blocks.11.attn.proj.weight, backbone.levels.2.blocks.11.attn.proj.bias, backbone.levels.2.blocks.11.attn.pos_emb_funct.relative_coords_table, backbone.levels.2.blocks.11.attn.pos_emb_funct.relative_position_index, backbone.levels.2.blocks.11.attn.pos_emb_funct.relative_bias, backbone.levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.2.blocks.11.norm2.weight, backbone.levels.2.blocks.11.norm2.bias, backbone.levels.2.blocks.11.mlp.fc1.weight, backbone.levels.2.blocks.11.mlp.fc1.bias, backbone.levels.2.blocks.11.mlp.fc2.weight, backbone.levels.2.blocks.11.mlp.fc2.bias, backbone.levels.2.downsample.norm.weight, backbone.levels.2.downsample.norm.bias, backbone.levels.2.downsample.reduction.0.weight, backbone.levels.3.blocks.0.gamma3, backbone.levels.3.blocks.0.gamma4, backbone.levels.3.blocks.0.pos_embed.relative_bias, backbone.levels.3.blocks.0.pos_embed.cpb_mlp.0.weight, backbone.levels.3.blocks.0.pos_embed.cpb_mlp.0.bias, backbone.levels.3.blocks.0.pos_embed.cpb_mlp.2.weight, backbone.levels.3.blocks.0.norm1.weight, backbone.levels.3.blocks.0.norm1.bias, backbone.levels.3.blocks.0.attn.qkv.weight, backbone.levels.3.blocks.0.attn.qkv.bias, backbone.levels.3.blocks.0.attn.proj.weight, backbone.levels.3.blocks.0.attn.proj.bias, backbone.levels.3.blocks.0.attn.pos_emb_funct.relative_coords_table, backbone.levels.3.blocks.0.attn.pos_emb_funct.relative_position_index, backbone.levels.3.blocks.0.attn.pos_emb_funct.relative_bias, backbone.levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.3.blocks.0.norm2.weight, backbone.levels.3.blocks.0.norm2.bias, backbone.levels.3.blocks.0.mlp.fc1.weight, backbone.levels.3.blocks.0.mlp.fc1.bias, backbone.levels.3.blocks.0.mlp.fc2.weight, backbone.levels.3.blocks.0.mlp.fc2.bias, backbone.levels.3.blocks.1.gamma3, backbone.levels.3.blocks.1.gamma4, backbone.levels.3.blocks.1.pos_embed.relative_bias, backbone.levels.3.blocks.1.pos_embed.cpb_mlp.0.weight, backbone.levels.3.blocks.1.pos_embed.cpb_mlp.0.bias, backbone.levels.3.blocks.1.pos_embed.cpb_mlp.2.weight, backbone.levels.3.blocks.1.norm1.weight, backbone.levels.3.blocks.1.norm1.bias, backbone.levels.3.blocks.1.attn.qkv.weight, backbone.levels.3.blocks.1.attn.qkv.bias, backbone.levels.3.blocks.1.attn.proj.weight, backbone.levels.3.blocks.1.attn.proj.bias, backbone.levels.3.blocks.1.attn.pos_emb_funct.relative_coords_table, backbone.levels.3.blocks.1.attn.pos_emb_funct.relative_position_index, backbone.levels.3.blocks.1.attn.pos_emb_funct.relative_bias, backbone.levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.3.blocks.1.norm2.weight, backbone.levels.3.blocks.1.norm2.bias, backbone.levels.3.blocks.1.mlp.fc1.weight, backbone.levels.3.blocks.1.mlp.fc1.bias, backbone.levels.3.blocks.1.mlp.fc2.weight, backbone.levels.3.blocks.1.mlp.fc2.bias, backbone.levels.3.blocks.2.gamma3, backbone.levels.3.blocks.2.gamma4, backbone.levels.3.blocks.2.pos_embed.relative_bias, backbone.levels.3.blocks.2.pos_embed.cpb_mlp.0.weight, backbone.levels.3.blocks.2.pos_embed.cpb_mlp.0.bias, backbone.levels.3.blocks.2.pos_embed.cpb_mlp.2.weight, backbone.levels.3.blocks.2.norm1.weight, backbone.levels.3.blocks.2.norm1.bias, backbone.levels.3.blocks.2.attn.qkv.weight, backbone.levels.3.blocks.2.attn.qkv.bias, backbone.levels.3.blocks.2.attn.proj.weight, backbone.levels.3.blocks.2.attn.proj.bias, backbone.levels.3.blocks.2.attn.pos_emb_funct.relative_coords_table, backbone.levels.3.blocks.2.attn.pos_emb_funct.relative_position_index, backbone.levels.3.blocks.2.attn.pos_emb_funct.relative_bias, backbone.levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.3.blocks.2.norm2.weight, backbone.levels.3.blocks.2.norm2.bias, backbone.levels.3.blocks.2.mlp.fc1.weight, backbone.levels.3.blocks.2.mlp.fc1.bias, backbone.levels.3.blocks.2.mlp.fc2.weight, backbone.levels.3.blocks.2.mlp.fc2.bias, backbone.levels.3.blocks.3.gamma3, backbone.levels.3.blocks.3.gamma4, backbone.levels.3.blocks.3.pos_embed.relative_bias, backbone.levels.3.blocks.3.pos_embed.cpb_mlp.0.weight, backbone.levels.3.blocks.3.pos_embed.cpb_mlp.0.bias, backbone.levels.3.blocks.3.pos_embed.cpb_mlp.2.weight, backbone.levels.3.blocks.3.norm1.weight, backbone.levels.3.blocks.3.norm1.bias, backbone.levels.3.blocks.3.attn.qkv.weight, backbone.levels.3.blocks.3.attn.qkv.bias, backbone.levels.3.blocks.3.attn.proj.weight, backbone.levels.3.blocks.3.attn.proj.bias, backbone.levels.3.blocks.3.attn.pos_emb_funct.relative_coords_table, backbone.levels.3.blocks.3.attn.pos_emb_funct.relative_position_index, backbone.levels.3.blocks.3.attn.pos_emb_funct.relative_bias, backbone.levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.3.blocks.3.norm2.weight, backbone.levels.3.blocks.3.norm2.bias, backbone.levels.3.blocks.3.mlp.fc1.weight, backbone.levels.3.blocks.3.mlp.fc1.bias, backbone.levels.3.blocks.3.mlp.fc2.weight, backbone.levels.3.blocks.3.mlp.fc2.bias, backbone.levels.3.blocks.4.gamma3, backbone.levels.3.blocks.4.gamma4, backbone.levels.3.blocks.4.pos_embed.relative_bias, backbone.levels.3.blocks.4.pos_embed.cpb_mlp.0.weight, backbone.levels.3.blocks.4.pos_embed.cpb_mlp.0.bias, backbone.levels.3.blocks.4.pos_embed.cpb_mlp.2.weight, backbone.levels.3.blocks.4.norm1.weight, backbone.levels.3.blocks.4.norm1.bias, backbone.levels.3.blocks.4.attn.qkv.weight, backbone.levels.3.blocks.4.attn.qkv.bias, backbone.levels.3.blocks.4.attn.proj.weight, backbone.levels.3.blocks.4.attn.proj.bias, backbone.levels.3.blocks.4.attn.pos_emb_funct.relative_coords_table, backbone.levels.3.blocks.4.attn.pos_emb_funct.relative_position_index, backbone.levels.3.blocks.4.attn.pos_emb_funct.relative_bias, backbone.levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.0.weight, backbone.levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.0.bias, backbone.levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.2.weight, backbone.levels.3.blocks.4.norm2.weight, backbone.levels.3.blocks.4.norm2.bias, backbone.levels.3.blocks.4.mlp.fc1.weight, backbone.levels.3.blocks.4.mlp.fc1.bias, backbone.levels.3.blocks.4.mlp.fc2.weight, backbone.levels.3.blocks.4.mlp.fc2.bias, backbone.norm.weight, backbone.norm.bias, backbone.norm.running_mean, backbone.norm.running_var, backbone.norm.num_batches_tracked, head.fc.weight, head.fc.bias
missing keys in source state_dict: patch_embed.conv_down.0.weight, patch_embed.conv_down.1.weight, patch_embed.conv_down.1.bias, patch_embed.conv_down.1.running_mean, patch_embed.conv_down.1.running_var, patch_embed.conv_down.3.weight, patch_embed.conv_down.4.weight, patch_embed.conv_down.4.bias, patch_embed.conv_down.4.running_mean, patch_embed.conv_down.4.running_var, levels.0.blocks.0.conv1.weight, levels.0.blocks.0.conv1.bias, levels.0.blocks.0.norm1.weight, levels.0.blocks.0.norm1.bias, levels.0.blocks.0.norm1.running_mean, levels.0.blocks.0.norm1.running_var, levels.0.blocks.0.conv2.weight, levels.0.blocks.0.conv2.bias, levels.0.blocks.0.norm2.weight, levels.0.blocks.0.norm2.bias, levels.0.blocks.0.norm2.running_mean, levels.0.blocks.0.norm2.running_var, levels.0.blocks.1.conv1.weight, levels.0.blocks.1.conv1.bias, levels.0.blocks.1.norm1.weight, levels.0.blocks.1.norm1.bias, levels.0.blocks.1.norm1.running_mean, levels.0.blocks.1.norm1.running_var, levels.0.blocks.1.conv2.weight, levels.0.blocks.1.conv2.bias, levels.0.blocks.1.norm2.weight, levels.0.blocks.1.norm2.bias, levels.0.blocks.1.norm2.running_mean, levels.0.blocks.1.norm2.running_var, levels.0.blocks.2.conv1.weight, levels.0.blocks.2.conv1.bias, levels.0.blocks.2.norm1.weight, levels.0.blocks.2.norm1.bias, levels.0.blocks.2.norm1.running_mean, levels.0.blocks.2.norm1.running_var, levels.0.blocks.2.conv2.weight, levels.0.blocks.2.conv2.bias, levels.0.blocks.2.norm2.weight, levels.0.blocks.2.norm2.bias, levels.0.blocks.2.norm2.running_mean, levels.0.blocks.2.norm2.running_var, levels.0.downsample.norm.weight, levels.0.downsample.norm.bias, levels.0.downsample.reduction.0.weight, levels.1.blocks.0.conv1.weight, levels.1.blocks.0.conv1.bias, levels.1.blocks.0.norm1.weight, levels.1.blocks.0.norm1.bias, levels.1.blocks.0.norm1.running_mean, levels.1.blocks.0.norm1.running_var, levels.1.blocks.0.conv2.weight, levels.1.blocks.0.conv2.bias, levels.1.blocks.0.norm2.weight, levels.1.blocks.0.norm2.bias, levels.1.blocks.0.norm2.running_mean, levels.1.blocks.0.norm2.running_var, levels.1.blocks.1.conv1.weight, levels.1.blocks.1.conv1.bias, levels.1.blocks.1.norm1.weight, levels.1.blocks.1.norm1.bias, levels.1.blocks.1.norm1.running_mean, levels.1.blocks.1.norm1.running_var, levels.1.blocks.1.conv2.weight, levels.1.blocks.1.conv2.bias, levels.1.blocks.1.norm2.weight, levels.1.blocks.1.norm2.bias, levels.1.blocks.1.norm2.running_mean, levels.1.blocks.1.norm2.running_var, levels.1.blocks.2.conv1.weight, levels.1.blocks.2.conv1.bias, levels.1.blocks.2.norm1.weight, levels.1.blocks.2.norm1.bias, levels.1.blocks.2.norm1.running_mean, levels.1.blocks.2.norm1.running_var, levels.1.blocks.2.conv2.weight, levels.1.blocks.2.conv2.bias, levels.1.blocks.2.norm2.weight, levels.1.blocks.2.norm2.bias, levels.1.blocks.2.norm2.running_mean, levels.1.blocks.2.norm2.running_var, levels.1.downsample.norm.weight, levels.1.downsample.norm.bias, levels.1.downsample.reduction.0.weight, levels.2.blocks.0.gamma3, levels.2.blocks.0.gamma4, levels.2.blocks.0.pos_embed.relative_bias, levels.2.blocks.0.pos_embed.cpb_mlp.0.weight, levels.2.blocks.0.pos_embed.cpb_mlp.0.bias, levels.2.blocks.0.pos_embed.cpb_mlp.2.weight, levels.2.blocks.0.norm1.weight, levels.2.blocks.0.norm1.bias, levels.2.blocks.0.attn.qkv.weight, levels.2.blocks.0.attn.qkv.bias, levels.2.blocks.0.attn.proj.weight, levels.2.blocks.0.attn.proj.bias, levels.2.blocks.0.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.0.attn.pos_emb_funct.relative_position_index, levels.2.blocks.0.attn.pos_emb_funct.relative_bias, levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.0.norm2.weight, levels.2.blocks.0.norm2.bias, levels.2.blocks.0.mlp.fc1.weight, levels.2.blocks.0.mlp.fc1.bias, levels.2.blocks.0.mlp.fc2.weight, levels.2.blocks.0.mlp.fc2.bias, levels.2.blocks.1.gamma3, levels.2.blocks.1.gamma4, levels.2.blocks.1.pos_embed.relative_bias, levels.2.blocks.1.pos_embed.cpb_mlp.0.weight, levels.2.blocks.1.pos_embed.cpb_mlp.0.bias, levels.2.blocks.1.pos_embed.cpb_mlp.2.weight, levels.2.blocks.1.norm1.weight, levels.2.blocks.1.norm1.bias, levels.2.blocks.1.attn.qkv.weight, levels.2.blocks.1.attn.qkv.bias, levels.2.blocks.1.attn.proj.weight, levels.2.blocks.1.attn.proj.bias, levels.2.blocks.1.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.1.attn.pos_emb_funct.relative_position_index, levels.2.blocks.1.attn.pos_emb_funct.relative_bias, levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.1.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.1.norm2.weight, levels.2.blocks.1.norm2.bias, levels.2.blocks.1.mlp.fc1.weight, levels.2.blocks.1.mlp.fc1.bias, levels.2.blocks.1.mlp.fc2.weight, levels.2.blocks.1.mlp.fc2.bias, levels.2.blocks.2.gamma3, levels.2.blocks.2.gamma4, levels.2.blocks.2.pos_embed.relative_bias, levels.2.blocks.2.pos_embed.cpb_mlp.0.weight, levels.2.blocks.2.pos_embed.cpb_mlp.0.bias, levels.2.blocks.2.pos_embed.cpb_mlp.2.weight, levels.2.blocks.2.norm1.weight, levels.2.blocks.2.norm1.bias, levels.2.blocks.2.attn.qkv.weight, levels.2.blocks.2.attn.qkv.bias, levels.2.blocks.2.attn.proj.weight, levels.2.blocks.2.attn.proj.bias, levels.2.blocks.2.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.2.attn.pos_emb_funct.relative_position_index, levels.2.blocks.2.attn.pos_emb_funct.relative_bias, levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.2.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.2.norm2.weight, levels.2.blocks.2.norm2.bias, levels.2.blocks.2.mlp.fc1.weight, levels.2.blocks.2.mlp.fc1.bias, levels.2.blocks.2.mlp.fc2.weight, levels.2.blocks.2.mlp.fc2.bias, levels.2.blocks.3.gamma3, levels.2.blocks.3.gamma4, levels.2.blocks.3.pos_embed.relative_bias, levels.2.blocks.3.pos_embed.cpb_mlp.0.weight, levels.2.blocks.3.pos_embed.cpb_mlp.0.bias, levels.2.blocks.3.pos_embed.cpb_mlp.2.weight, levels.2.blocks.3.norm1.weight, levels.2.blocks.3.norm1.bias, levels.2.blocks.3.attn.qkv.weight, levels.2.blocks.3.attn.qkv.bias, levels.2.blocks.3.attn.proj.weight, levels.2.blocks.3.attn.proj.bias, levels.2.blocks.3.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.3.attn.pos_emb_funct.relative_position_index, levels.2.blocks.3.attn.pos_emb_funct.relative_bias, levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.3.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.3.norm2.weight, levels.2.blocks.3.norm2.bias, levels.2.blocks.3.mlp.fc1.weight, levels.2.blocks.3.mlp.fc1.bias, levels.2.blocks.3.mlp.fc2.weight, levels.2.blocks.3.mlp.fc2.bias, levels.2.blocks.4.gamma3, levels.2.blocks.4.gamma4, levels.2.blocks.4.pos_embed.relative_bias, levels.2.blocks.4.pos_embed.cpb_mlp.0.weight, levels.2.blocks.4.pos_embed.cpb_mlp.0.bias, levels.2.blocks.4.pos_embed.cpb_mlp.2.weight, levels.2.blocks.4.norm1.weight, levels.2.blocks.4.norm1.bias, levels.2.blocks.4.attn.qkv.weight, levels.2.blocks.4.attn.qkv.bias, levels.2.blocks.4.attn.proj.weight, levels.2.blocks.4.attn.proj.bias, levels.2.blocks.4.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.4.attn.pos_emb_funct.relative_position_index, levels.2.blocks.4.attn.pos_emb_funct.relative_bias, levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.4.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.4.norm2.weight, levels.2.blocks.4.norm2.bias, levels.2.blocks.4.mlp.fc1.weight, levels.2.blocks.4.mlp.fc1.bias, levels.2.blocks.4.mlp.fc2.weight, levels.2.blocks.4.mlp.fc2.bias, levels.2.blocks.5.gamma3, levels.2.blocks.5.gamma4, levels.2.blocks.5.pos_embed.relative_bias, levels.2.blocks.5.pos_embed.cpb_mlp.0.weight, levels.2.blocks.5.pos_embed.cpb_mlp.0.bias, levels.2.blocks.5.pos_embed.cpb_mlp.2.weight, levels.2.blocks.5.norm1.weight, levels.2.blocks.5.norm1.bias, levels.2.blocks.5.attn.qkv.weight, levels.2.blocks.5.attn.qkv.bias, levels.2.blocks.5.attn.proj.weight, levels.2.blocks.5.attn.proj.bias, levels.2.blocks.5.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.5.attn.pos_emb_funct.relative_position_index, levels.2.blocks.5.attn.pos_emb_funct.relative_bias, levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.5.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.5.norm2.weight, levels.2.blocks.5.norm2.bias, levels.2.blocks.5.mlp.fc1.weight, levels.2.blocks.5.mlp.fc1.bias, levels.2.blocks.5.mlp.fc2.weight, levels.2.blocks.5.mlp.fc2.bias, levels.2.blocks.6.gamma3, levels.2.blocks.6.gamma4, levels.2.blocks.6.pos_embed.relative_bias, levels.2.blocks.6.pos_embed.cpb_mlp.0.weight, levels.2.blocks.6.pos_embed.cpb_mlp.0.bias, levels.2.blocks.6.pos_embed.cpb_mlp.2.weight, levels.2.blocks.6.norm1.weight, levels.2.blocks.6.norm1.bias, levels.2.blocks.6.attn.qkv.weight, levels.2.blocks.6.attn.qkv.bias, levels.2.blocks.6.attn.proj.weight, levels.2.blocks.6.attn.proj.bias, levels.2.blocks.6.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.6.attn.pos_emb_funct.relative_position_index, levels.2.blocks.6.attn.pos_emb_funct.relative_bias, levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.6.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.6.norm2.weight, levels.2.blocks.6.norm2.bias, levels.2.blocks.6.mlp.fc1.weight, levels.2.blocks.6.mlp.fc1.bias, levels.2.blocks.6.mlp.fc2.weight, levels.2.blocks.6.mlp.fc2.bias, levels.2.blocks.7.gamma3, levels.2.blocks.7.gamma4, levels.2.blocks.7.pos_embed.relative_bias, levels.2.blocks.7.pos_embed.cpb_mlp.0.weight, levels.2.blocks.7.pos_embed.cpb_mlp.0.bias, levels.2.blocks.7.pos_embed.cpb_mlp.2.weight, levels.2.blocks.7.norm1.weight, levels.2.blocks.7.norm1.bias, levels.2.blocks.7.attn.qkv.weight, levels.2.blocks.7.attn.qkv.bias, levels.2.blocks.7.attn.proj.weight, levels.2.blocks.7.attn.proj.bias, levels.2.blocks.7.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.7.attn.pos_emb_funct.relative_position_index, levels.2.blocks.7.attn.pos_emb_funct.relative_bias, levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.7.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.7.norm2.weight, levels.2.blocks.7.norm2.bias, levels.2.blocks.7.mlp.fc1.weight, levels.2.blocks.7.mlp.fc1.bias, levels.2.blocks.7.mlp.fc2.weight, levels.2.blocks.7.mlp.fc2.bias, levels.2.blocks.8.gamma3, levels.2.blocks.8.gamma4, levels.2.blocks.8.pos_embed.relative_bias, levels.2.blocks.8.pos_embed.cpb_mlp.0.weight, levels.2.blocks.8.pos_embed.cpb_mlp.0.bias, levels.2.blocks.8.pos_embed.cpb_mlp.2.weight, levels.2.blocks.8.norm1.weight, levels.2.blocks.8.norm1.bias, levels.2.blocks.8.attn.qkv.weight, levels.2.blocks.8.attn.qkv.bias, levels.2.blocks.8.attn.proj.weight, levels.2.blocks.8.attn.proj.bias, levels.2.blocks.8.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.8.attn.pos_emb_funct.relative_position_index, levels.2.blocks.8.attn.pos_emb_funct.relative_bias, levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.8.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.8.norm2.weight, levels.2.blocks.8.norm2.bias, levels.2.blocks.8.mlp.fc1.weight, levels.2.blocks.8.mlp.fc1.bias, levels.2.blocks.8.mlp.fc2.weight, levels.2.blocks.8.mlp.fc2.bias, levels.2.blocks.9.gamma3, levels.2.blocks.9.gamma4, levels.2.blocks.9.pos_embed.relative_bias, levels.2.blocks.9.pos_embed.cpb_mlp.0.weight, levels.2.blocks.9.pos_embed.cpb_mlp.0.bias, levels.2.blocks.9.pos_embed.cpb_mlp.2.weight, levels.2.blocks.9.norm1.weight, levels.2.blocks.9.norm1.bias, levels.2.blocks.9.attn.qkv.weight, levels.2.blocks.9.attn.qkv.bias, levels.2.blocks.9.attn.proj.weight, levels.2.blocks.9.attn.proj.bias, levels.2.blocks.9.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.9.attn.pos_emb_funct.relative_position_index, levels.2.blocks.9.attn.pos_emb_funct.relative_bias, levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.9.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.9.norm2.weight, levels.2.blocks.9.norm2.bias, levels.2.blocks.9.mlp.fc1.weight, levels.2.blocks.9.mlp.fc1.bias, levels.2.blocks.9.mlp.fc2.weight, levels.2.blocks.9.mlp.fc2.bias, levels.2.blocks.10.gamma3, levels.2.blocks.10.gamma4, levels.2.blocks.10.pos_embed.relative_bias, levels.2.blocks.10.pos_embed.cpb_mlp.0.weight, levels.2.blocks.10.pos_embed.cpb_mlp.0.bias, levels.2.blocks.10.pos_embed.cpb_mlp.2.weight, levels.2.blocks.10.norm1.weight, levels.2.blocks.10.norm1.bias, levels.2.blocks.10.attn.qkv.weight, levels.2.blocks.10.attn.qkv.bias, levels.2.blocks.10.attn.proj.weight, levels.2.blocks.10.attn.proj.bias, levels.2.blocks.10.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.10.attn.pos_emb_funct.relative_position_index, levels.2.blocks.10.attn.pos_emb_funct.relative_bias, levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.10.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.10.norm2.weight, levels.2.blocks.10.norm2.bias, levels.2.blocks.10.mlp.fc1.weight, levels.2.blocks.10.mlp.fc1.bias, levels.2.blocks.10.mlp.fc2.weight, levels.2.blocks.10.mlp.fc2.bias, levels.2.blocks.11.gamma3, levels.2.blocks.11.gamma4, levels.2.blocks.11.pos_embed.relative_bias, levels.2.blocks.11.pos_embed.cpb_mlp.0.weight, levels.2.blocks.11.pos_embed.cpb_mlp.0.bias, levels.2.blocks.11.pos_embed.cpb_mlp.2.weight, levels.2.blocks.11.norm1.weight, levels.2.blocks.11.norm1.bias, levels.2.blocks.11.attn.qkv.weight, levels.2.blocks.11.attn.qkv.bias, levels.2.blocks.11.attn.proj.weight, levels.2.blocks.11.attn.proj.bias, levels.2.blocks.11.attn.pos_emb_funct.relative_coords_table, levels.2.blocks.11.attn.pos_emb_funct.relative_position_index, levels.2.blocks.11.attn.pos_emb_funct.relative_bias, levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.0.weight, levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.0.bias, levels.2.blocks.11.attn.pos_emb_funct.cpb_mlp.2.weight, levels.2.blocks.11.norm2.weight, levels.2.blocks.11.norm2.bias, levels.2.blocks.11.mlp.fc1.weight, levels.2.blocks.11.mlp.fc1.bias, levels.2.blocks.11.mlp.fc2.weight, levels.2.blocks.11.mlp.fc2.bias, levels.2.downsample.norm.weight, levels.2.downsample.norm.bias, levels.2.downsample.reduction.0.weight, levels.3.blocks.0.gamma3, levels.3.blocks.0.gamma4, levels.3.blocks.0.pos_embed.relative_bias, levels.3.blocks.0.pos_embed.cpb_mlp.0.weight, levels.3.blocks.0.pos_embed.cpb_mlp.0.bias, levels.3.blocks.0.pos_embed.cpb_mlp.2.weight, levels.3.blocks.0.norm1.weight, levels.3.blocks.0.norm1.bias, levels.3.blocks.0.attn.qkv.weight, levels.3.blocks.0.attn.qkv.bias, levels.3.blocks.0.attn.proj.weight, levels.3.blocks.0.attn.proj.bias, levels.3.blocks.0.attn.pos_emb_funct.relative_coords_table, levels.3.blocks.0.attn.pos_emb_funct.relative_position_index, levels.3.blocks.0.attn.pos_emb_funct.relative_bias, levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.0.weight, levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.0.bias, levels.3.blocks.0.attn.pos_emb_funct.cpb_mlp.2.weight, levels.3.blocks.0.norm2.weight, levels.3.blocks.0.norm2.bias, levels.3.blocks.0.mlp.fc1.weight, levels.3.blocks.0.mlp.fc1.bias, levels.3.blocks.0.mlp.fc2.weight, levels.3.blocks.0.mlp.fc2.bias, levels.3.blocks.1.gamma3, levels.3.blocks.1.gamma4, levels.3.blocks.1.pos_embed.relative_bias, levels.3.blocks.1.pos_embed.cpb_mlp.0.weight, levels.3.blocks.1.pos_embed.cpb_mlp.0.bias, levels.3.blocks.1.pos_embed.cpb_mlp.2.weight, levels.3.blocks.1.norm1.weight, levels.3.blocks.1.norm1.bias, levels.3.blocks.1.attn.qkv.weight, levels.3.blocks.1.attn.qkv.bias, levels.3.blocks.1.attn.proj.weight, levels.3.blocks.1.attn.proj.bias, levels.3.blocks.1.attn.pos_emb_funct.relative_coords_table, levels.3.blocks.1.attn.pos_emb_funct.relative_position_index, levels.3.blocks.1.attn.pos_emb_funct.relative_bias, levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.0.weight, levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.0.bias, levels.3.blocks.1.attn.pos_emb_funct.cpb_mlp.2.weight, levels.3.blocks.1.norm2.weight, levels.3.blocks.1.norm2.bias, levels.3.blocks.1.mlp.fc1.weight, levels.3.blocks.1.mlp.fc1.bias, levels.3.blocks.1.mlp.fc2.weight, levels.3.blocks.1.mlp.fc2.bias, levels.3.blocks.2.gamma3, levels.3.blocks.2.gamma4, levels.3.blocks.2.pos_embed.relative_bias, levels.3.blocks.2.pos_embed.cpb_mlp.0.weight, levels.3.blocks.2.pos_embed.cpb_mlp.0.bias, levels.3.blocks.2.pos_embed.cpb_mlp.2.weight, levels.3.blocks.2.norm1.weight, levels.3.blocks.2.norm1.bias, levels.3.blocks.2.attn.qkv.weight, levels.3.blocks.2.attn.qkv.bias, levels.3.blocks.2.attn.proj.weight, levels.3.blocks.2.attn.proj.bias, levels.3.blocks.2.attn.pos_emb_funct.relative_coords_table, levels.3.blocks.2.attn.pos_emb_funct.relative_position_index, levels.3.blocks.2.attn.pos_emb_funct.relative_bias, levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.0.weight, levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.0.bias, levels.3.blocks.2.attn.pos_emb_funct.cpb_mlp.2.weight, levels.3.blocks.2.norm2.weight, levels.3.blocks.2.norm2.bias, levels.3.blocks.2.mlp.fc1.weight, levels.3.blocks.2.mlp.fc1.bias, levels.3.blocks.2.mlp.fc2.weight, levels.3.blocks.2.mlp.fc2.bias, levels.3.blocks.3.gamma3, levels.3.blocks.3.gamma4, levels.3.blocks.3.pos_embed.relative_bias, levels.3.blocks.3.pos_embed.cpb_mlp.0.weight, levels.3.blocks.3.pos_embed.cpb_mlp.0.bias, levels.3.blocks.3.pos_embed.cpb_mlp.2.weight, levels.3.blocks.3.norm1.weight, levels.3.blocks.3.norm1.bias, levels.3.blocks.3.attn.qkv.weight, levels.3.blocks.3.attn.qkv.bias, levels.3.blocks.3.attn.proj.weight, levels.3.blocks.3.attn.proj.bias, levels.3.blocks.3.attn.pos_emb_funct.relative_coords_table, levels.3.blocks.3.attn.pos_emb_funct.relative_position_index, levels.3.blocks.3.attn.pos_emb_funct.relative_bias, levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.0.weight, levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.0.bias, levels.3.blocks.3.attn.pos_emb_funct.cpb_mlp.2.weight, levels.3.blocks.3.norm2.weight, levels.3.blocks.3.norm2.bias, levels.3.blocks.3.mlp.fc1.weight, levels.3.blocks.3.mlp.fc1.bias, levels.3.blocks.3.mlp.fc2.weight, levels.3.blocks.3.mlp.fc2.bias, levels.3.blocks.4.gamma3, levels.3.blocks.4.gamma4, levels.3.blocks.4.pos_embed.relative_bias, levels.3.blocks.4.pos_embed.cpb_mlp.0.weight, levels.3.blocks.4.pos_embed.cpb_mlp.0.bias, levels.3.blocks.4.pos_embed.cpb_mlp.2.weight, levels.3.blocks.4.norm1.weight, levels.3.blocks.4.norm1.bias, levels.3.blocks.4.attn.qkv.weight, levels.3.blocks.4.attn.qkv.bias, levels.3.blocks.4.attn.proj.weight, levels.3.blocks.4.attn.proj.bias, levels.3.blocks.4.attn.pos_emb_funct.relative_coords_table, levels.3.blocks.4.attn.pos_emb_funct.relative_position_index, levels.3.blocks.4.attn.pos_emb_funct.relative_bias, levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.0.weight, levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.0.bias, levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.2.weight, levels.3.blocks.4.norm2.weight, levels.3.blocks.4.norm2.bias, levels.3.blocks.4.mlp.fc1.weight, levels.3.blocks.4.mlp.fc1.bias, levels.3.blocks.4.mlp.fc2.weight, levels.3.blocks.4.mlp.fc2.bias, norm.weight, norm.bias, norm.running_mean, norm.running_var, head.weight, head.bias
03/06 17:49:59 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
03/06 17:49:59 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
03/06 17:49:59 - mmengine - INFO - Checkpoints will be saved to /results/classification_experiment/train.
Error executing job with overrides: ['results_dir=/results/classification_experiment', 'train.num_gpus=1', 'model.init_cfg.checkpoint=/workspace/tao-experiments/pretrained/fastervit_4_21k_224_w14.pth']Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 69, in _func
raise e
File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py", line 48, in _func
runner(cfg, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py", line 88, in main
run_experiment(cfg)
File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py", line 74, in run_experiment
runner.train()
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 113, in run_epoch
self.run_iter(idx, data_batch)
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 129, in run_iter
outputs = self.runner.model.train_step(
File "/usr/local/lib/python3.10/dist-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
losses = self._run_forward(data, mode='loss')
File "/usr/local/lib/python3.10/dist-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
results = self(**data, mode=mode)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1536, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1589, in forward
inputs, kwargs = self._pre_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1480, in _pre_forward
if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 0: 405 406
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
E0306 17:50:07.071000 139710467491648 torch/distributed/elastic/multiprocessing/api.py:881] failed (exitcode: 1) local_rank: 0 (pid: 392) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]: time : 2025-03-06_17:50:07
host : 1acf5ca9b93d
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 392)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
[2025-03-06 17:50:07,288 - TAO Toolkit - root - INFO] Sending telemetry data.
[2025-03-06 17:50:07,288 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-03-06 17:50:07,288 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'train', 'network': 'classification_pyt', 'gpu': ['NVIDIA-GeForce-RTX-2060'], 'success': False, 'time_lapsed': 21} to https://api.tao.ngc.nvidia.com.
[2025-03-06 17:50:09,015 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2025-03-06 17:50:09,015 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2025-03-06 17:50:09,015 - TAO Toolkit - root - WARNING] Execution status: FAIL
2025-03-06 23:50:09,976 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
I dont understand why TAO is trying to train with DDP thus gets error
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
But when I run the training with the parameter find_unused_parameters: True
in train_config, the training runs fine. But I suspect enabling this option introduces an additional traversal of the autograd graph in each iteration, which can lead to increased computational overhead and, consequently, slower training speeds. This overhead does not inherently affect the model’s accuracy but can impact the efficiency of the training process.
What could possibly went wrong? Is my configuration okey?
1 post - 1 participant