Dear @Morganh or Team,
I am using TAO-5 and trained model using re-identification net sample.
I used market-net dataset and also include different custom data-source as well.
My training got complete after 2 weeks. But at end I am unable to see the .tlt file. can you please suggest what is the gap and where is the gap in training.
below is training configuration
results_dir: "/results"
encryption_key: nvidia_tao
model:
backbone: resnet_50
last_stride: 1
pretrain_choice: imagenet
pretrained_model_path: "/model/resnet50_pretrained.pth"
input_channels: 3
input_width: 128
input_height: 256
neck: bnneck
feat_dim: 256
neck_feat: after
metric_loss_type: triplet
with_center_loss: False
with_flip_feature: False
label_smooth: True
dataset:
train_dataset_dir: "/data/sample_train"
test_dataset_dir: "/data/sample_test"
query_dataset_dir: "/data/sample_query"
num_classes: 100
batch_size: 64
val_batch_size: 128
num_workers: 1
pixel_mean: [0.485, 0.456, 0.406]
pixel_std: [0.226, 0.226, 0.226]
padding: 10
prob: 0.5
re_prob: 0.5
sampler: softmax_triplet
num_instances: 4
re_ranking:
re_ranking: True
k1: 20
k2: 6
lambda_value: 0.3
train:
optim:
name: Adam
steps: [40, 70]
gamma: 0.1
bias_lr_factor: 1
weight_decay: 0.0005
weight_decay_bias: 0.0005
warmup_factor: 0.01
warmup_iters: 10
warmup_method: linear
base_lr: 0.00035
momentum: 0.9
center_loss_weight: 0.0005
center_lr: 0.5
triplet_loss_margin: 0.3
num_epochs: 120
checkpoint_interval: 10
Below is the training logs.
Train model
2024-04-03 20:11:37,331 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-04-03 20:11:37,383 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2024-04-03 20:11:37,436 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-04-03 20:11:37,436 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
sys:1: UserWarning:
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning:
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
Train results will be saved at: /results/market1501/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset │ # IDs │ # Images │ # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train │ 13526 │ 1820258 │ 10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query │ 793 │ 2347 │ 10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery │ 100 │ 1779 │ 6 │
╘══════════╧═════════╧════════════╧═════════════╛
<frozen core.loggers.api_logging>:245: UserWarning: Log file already exists at /results/market1501/train/status.json
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /results/market1501/train exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
| Name | Type | Params
--------------------------------------------
0 | model | Baseline | 27.5 M
1 | train_accuracy | Accuracy | 0
2 | val_accuracy | Accuracy | 0
--------------------------------------------
27.5 M Trainable params
256 Non-trainable params
27.5 M Total params
109.983 Total estimated model params size (MB)
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0: 0%| | 0/27176 [00:00<?, ?it/s]
Epoch 0: 100%|█████▉| 27154/27176 [2:15:05<00:06, 3.35it/s, loss=1.34, v_num=1]Train and Val metrics generated.
Epoch 0: 100%|▉| 27154/27176 [2:15:05<00:06, 3.35it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 1: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=5.430,
Epoch 1: 100%|▉| 27174/27176 [1:41:20<00:00, 4.47it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 1: 100%|▉| 27174/27176 [1:41:21<00:00, 4.47it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 2: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,
Epoch 2: 100%|▉| 27167/27176 [1:42:21<00:02, 4.42it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 2: 100%|▉| 27167/27176 [1:42:22<00:02, 4.42it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 3: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.870,
Epoch 3: 100%|▉| 27141/27176 [1:43:35<00:08, 4.37it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 3: 100%|▉| 27141/27176 [1:43:36<00:08, 4.37it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 4: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.730,
Epoch 4: 100%|▉| 27126/27176 [1:44:43<00:11, 4.32it/s, loss=1.35, v_num=1, traiTrain and Val metrics generated.
Epoch 4: 100%|▉| 27126/27176 [1:44:44<00:11, 4.32it/s, loss=1.35, v_num=1, traiTraining loop in progress
Epoch 5: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.670,
Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09, 4.37it/s, loss=1.31, v_num=1, traiTrain and Val metrics generated.
Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09, 4.37it/s, loss=1.31, v_num=1, traiTraining loop in progress
Epoch 6: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=2.630,
Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03, 4.38it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03, 4.38it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 7: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.610,
Epoch 7: 100%|█| 27176/27176 [1:44:59<00:00, 4.31it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 7: 100%|█| 27176/27176 [1:45:00<00:00, 4.31it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 8: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.600,
Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09, 4.17it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09, 4.17it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 9: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.590,
Epoch 9: 100%|▉| 27152/27176 [1:50:43<00:05, 4.09it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 9: 100%|▉| 27152/27176 [1:50:44<00:05, 4.09it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 10: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,
Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04, 4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04, 4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 11: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.600,
Epoch 11: 100%|▉| 27131/27176 [1:46:14<00:10, 4.26it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 11: 100%|▉| 27131/27176 [1:46:15<00:10, 4.26it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 12: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,
Epoch 12: 100%|█| 27176/27176 [1:48:30<00:00, 4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 12: 100%|█| 27176/27176 [1:48:31<00:00, 4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 13: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.580,
Epoch 13: 100%|█| 27176/27176 [1:41:17<00:00, 4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 13: 100%|█| 27176/27176 [1:41:18<00:00, 4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 14: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.570,
Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06, 4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06, 4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 15: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.570,
Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03, 4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03, 4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 16: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,
Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11, 4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11, 4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 17: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,
Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10, 4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10, 4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 18: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04, 4.59it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04, 4.59it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 19: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.570,
Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05, 4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05, 4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 20: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,
Epoch 20: 100%|▉| 27147/27176 [1:38:46<00:06, 4.58it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 20: 100%|▉| 27147/27176 [1:38:47<00:06, 4.58it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 21: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.560,
Epoch 21: 100%|▉| 27139/27176 [1:38:55<00:08, 4.57it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 21: 100%|▉| 27139/27176 [1:38:56<00:08, 4.57it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 22: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,
Epoch 22: 100%|▉| 27128/27176 [1:48:27<00:11, 4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 22: 100%|▉| 27128/27176 [1:48:28<00:11, 4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 23: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03, 4.31it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03, 4.31it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 24: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 24: 100%|▉| 27162/27176 [1:48:23<00:03, 4.18it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 24: 100%|▉| 27162/27176 [1:48:24<00:03, 4.18it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 25: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,
Epoch 25: 100%|▉| 27148/27176 [1:51:51<00:06, 4.04it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 25: 100%|▉| 27148/27176 [1:51:52<00:06, 4.04it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 26: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,
Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02, 3.93it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02, 3.93it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 27: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,
Epoch 27: 100%|▉| 27137/27176 [1:39:42<00:08, 4.54it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 27: 100%|▉| 27137/27176 [1:39:43<00:08, 4.54it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 28: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02, 4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02, 4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 29: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,
Epoch 29: 100%|▉| 27123/27176 [1:39:59<00:11, 4.52it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 29: 100%|▉| 27123/27176 [1:40:00<00:11, 4.52it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 30: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 30: 100%|▉| 27165/27176 [1:39:39<00:02, 4.54it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 30: 100%|▉| 27165/27176 [1:39:40<00:02, 4.54it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 31: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,
Epoch 31: 100%|▉| 27153/27176 [1:39:17<00:05, 4.56it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 31: 100%|▉| 27153/27176 [1:39:18<00:05, 4.56it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 32: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,
Epoch 32: 100%|▉| 27153/27176 [1:39:11<00:05, 4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 32: 100%|▉| 27153/27176 [1:39:12<00:05, 4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 33: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.550,
Epoch 33: 100%|█| 27176/27176 [1:39:22<00:00, 4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 33: 100%|█| 27176/27176 [1:39:23<00:00, 4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 34: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,
Epoch 34: 100%|▉| 27156/27176 [1:39:18<00:04, 4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 34: 100%|▉| 27156/27176 [1:39:19<00:04, 4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 35: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.550,
Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01, 4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01, 4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 36: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,
Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01, 4.50it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01, 4.50it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 37: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,
Epoch 37: 100%|▉| 27160/27176 [1:39:05<00:03, 4.57it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 37: 100%|▉| 27160/27176 [1:39:06<00:03, 4.57it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 38: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,
Epoch 38: 100%|▉| 27131/27176 [1:48:40<00:10, 4.16it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 38: 100%|▉| 27131/27176 [1:48:41<00:10, 4.16it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 39: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.550,
Epoch 39: 100%|▉| 27155/27176 [1:54:27<00:05, 3.95it/s, loss=1.38, v_num=1, traTrain and Val metrics generated.
Epoch 39: 100%|▉| 27155/27176 [1:54:28<00:05, 3.95it/s, loss=1.38, v_num=1, traTraining loop in progress
Epoch 40: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.38, v_num=1, train_loss=5.030,
Epoch 40: 100%|▉| 27146/27176 [1:52:45<00:07, 4.01it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 40: 100%|▉| 27146/27176 [1:52:46<00:07, 4.01it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 41: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=3.870,
Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04, 4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04, 4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 42: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.700,
Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00, 4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00, 4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 43: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,
Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00, 4.60it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00, 4.60it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 44: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.510,
Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02, 4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02, 4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 45: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.450,
Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01, 4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01, 4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 46: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.410,
Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03, 4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03, 4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 47: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.380,
Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02, 4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02, 4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 48: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.360,
Epoch 48: 100%|▉| 27140/27176 [1:38:35<00:07, 4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 48: 100%|▉| 27140/27176 [1:38:36<00:07, 4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 49: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.350,
Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06, 4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06, 4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 50: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.330,
Epoch 50: 100%|▉| 27169/27176 [1:39:07<00:01, 4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 50: 100%|▉| 27169/27176 [1:39:08<00:01, 4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 51: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.320,
Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00, 4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00, 4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 52: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.310,
Epoch 52: 100%|▉| 27162/27176 [1:39:16<00:03, 4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 52: 100%|▉| 27162/27176 [1:39:17<00:03, 4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 53: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.320,
Epoch 53: 100%|▉| 27175/27176 [1:39:33<00:00, 4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 53: 100%|▉| 27175/27176 [1:39:34<00:00, 4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 54: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.300,
Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00, 4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00, 4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 55: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,
Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06, 4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06, 4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 56: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,
Epoch 56: 100%|▉| 27166/27176 [1:39:18<00:02, 4.56it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 56: 100%|▉| 27166/27176 [1:39:19<00:02, 4.56it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 57: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.290,
Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02, 4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02, 4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 58: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,
Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08, 4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08, 4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 59: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,
Epoch 59: 100%|▉| 27145/27176 [1:39:07<00:06, 4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 59: 100%|▉| 27145/27176 [1:39:08<00:06, 4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 60: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,
Epoch 60: 100%|▉| 27173/27176 [1:39:11<00:00, 4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 60: 100%|▉| 27173/27176 [1:39:12<00:00, 4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 61: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,
Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08, 4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08, 4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 62: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,
Epoch 62: 100%|▉| 27137/27176 [1:38:49<00:08, 4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 62: 100%|▉| 27137/27176 [1:38:50<00:08, 4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 63: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,
Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03, 4.58it/s, loss=1.31, v_num=1, traTrain and Val metrics generated.
Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03, 4.58it/s, loss=1.31, v_num=1, traTraining loop in progress
Epoch 64: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.280,
Epoch 64: 100%|▉| 27152/27176 [1:41:18<00:05, 4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 64: 100%|▉| 27152/27176 [1:41:19<00:05, 4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 65: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.280,
Epoch 65: 100%|▉| 27148/27176 [1:45:04<00:06, 4.31it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 65: 100%|▉| 27148/27176 [1:45:05<00:06, 4.31it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 66: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.270,
Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13, 3.78it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13, 3.78it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 67: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.270,
Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08, 4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08, 4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 68: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,
Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03, 4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03, 4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 69: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,
Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05, 3.25it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05, 3.25it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 70: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.890,
Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07, 4.63it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07, 4.63it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 71: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.490,
Epoch 71: 100%|▉| 27153/27176 [1:36:59<00:04, 4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 71: 100%|▉| 27153/27176 [1:37:00<00:04, 4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 72: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.240,
Epoch 72: 100%|▉| 27160/27176 [1:37:01<00:03, 4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 72: 100%|▉| 27160/27176 [1:37:02<00:03, 4.66it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 73: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.070,
Epoch 73: 100%|▉| 27147/27176 [1:36:54<00:06, 4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 73: 100%|▉| 27147/27176 [1:36:55<00:06, 4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 74: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.950,
Epoch 74: 100%|█| 27176/27176 [1:36:52<00:00, 4.68it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 74: 100%|█| 27176/27176 [1:36:53<00:00, 4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 75: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.870,
Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13, 4.67it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13, 4.67it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 76: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.810,
Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05, 4.67it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05, 4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 77: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.770,
Epoch 77: 100%|▉| 27142/27176 [1:36:48<00:07, 4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 77: 100%|▉| 27142/27176 [1:36:49<00:07, 4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 78: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.740,
Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08, 4.68it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08, 4.68it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 79: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.710,
Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05, 4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05, 4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 80: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.700,
Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08, 4.53it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08, 4.53it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 81: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.680,
Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01, 4.51it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01, 4.51it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 82: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.670,
Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00, 4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00, 4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 83: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.660,
Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00, 4.61it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00, 4.61it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 84: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.650,
Epoch 84: 100%|█| 27176/27176 [1:37:47<00:00, 4.63it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 84: 100%|█| 27176/27176 [1:37:48<00:00, 4.63it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 85: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.640,
Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09, 4.62it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09, 4.62it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 86: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.640,
Epoch 86: 100%|▉| 27165/27176 [1:38:11<00:02, 4.61it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 86: 100%|▉| 27165/27176 [1:38:12<00:02, 4.61it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 87: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.630,
Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05, 4.64it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05, 4.64it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 88: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.620,
Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08, 4.64it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08, 4.64it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 89: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.620,
Epoch 89: 100%|▉| 27164/27176 [1:37:39<00:02, 4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 89: 100%|▉| 27164/27176 [1:37:40<00:02, 4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 90: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,
Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05, 4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05, 4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 91: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,
Epoch 91: 100%|▉| 27172/27176 [1:37:39<00:00, 4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 91: 100%|▉| 27172/27176 [1:37:40<00:00, 4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 92: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,
Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00, 4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00, 4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 93: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.600,
Epoch 93: 100%|█| 27176/27176 [1:40:02<00:00, 4.53it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 93: 100%|█| 27176/27176 [1:40:03<00:00, 4.53it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 94: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.600,
Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09, 4.18it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09, 4.18it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 95: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.600,
Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00, 4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00, 4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 96: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.590,
Epoch 96: 100%|▉| 27156/27176 [1:41:28<00:04, 4.46it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 96: 100%|▉| 27156/27176 [1:41:29<00:04, 4.46it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 97: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.590,
Epoch 97: 100%|█| 27176/27176 [1:42:49<00:00, 4.40it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 97: 100%|█| 27176/27176 [1:42:50<00:00, 4.40it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 98: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,
Epoch 98: 100%|▉| 27161/27176 [1:38:29<00:03, 4.60it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 98: 100%|▉| 27161/27176 [1:38:30<00:03, 4.60it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 99: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580,
Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00, 4.68it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00, 4.68it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 100: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580
Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06, 4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06, 4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 101: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.580
Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07, 4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07, 4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 102: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570
Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05, 4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05, 4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 103: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570
Epoch 103: 100%|▉| 27169/27176 [1:36:45<00:01, 4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 103: 100%|▉| 27169/27176 [1:36:46<00:01, 4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 104: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570
Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01, 4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01, 4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 105: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560
Epoch 105: 100%|▉| 27152/27176 [1:36:45<00:05, 4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 105: 100%|▉| 27152/27176 [1:36:46<00:05, 4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 106: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570
Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05, 4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05, 4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 107: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560
Epoch 107: 100%|▉| 27125/27176 [1:38:34<00:11, 4.59it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 107: 100%|▉| 27125/27176 [1:38:35<00:11, 4.59it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 108: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560
Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00, 4.63it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00, 4.63it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 109: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.550
Epoch 109: 100%|▉| 27162/27176 [1:37:38<00:03, 4.64it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 109: 100%|▉| 27162/27176 [1:37:39<00:03, 4.64it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 110: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550
Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00, 4.49it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00, 4.49it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 111: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.540
Epoch 111: 100%|▉| 27148/27176 [1:44:13<00:06, 4.34it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 111: 100%|▉| 27148/27176 [1:44:14<00:06, 4.34it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 112: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550
Epoch 112: 100%|▉| 27151/27176 [1:43:13<00:05, 4.38it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 112: 100%|▉| 27151/27176 [1:43:14<00:05, 4.38it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 113: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540
Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00, 4.71it/s, loss=1.35, v_num=1, trTrain and Val metrics generated.
Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00, 4.71it/s, loss=1.35, v_num=1, trTraining loop in progress
Epoch 114: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.540
Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01, 4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01, 4.70it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 115: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540
Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00, 4.70it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00, 4.70it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 116: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.540
Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06, 4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06, 4.69it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 117: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540
Epoch 117: 100%|█| 27176/27176 [1:36:37<00:00, 4.69it/s, loss=1.31, v_num=1, trTrain and Val metrics generated.
Epoch 117: 100%|█| 27176/27176 [1:36:38<00:00, 4.69it/s, loss=1.31, v_num=1, trTraining loop in progress
Epoch 118: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.530
Epoch 118: 100%|▉| 27123/27176 [1:36:27<00:11, 4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 118: 100%|▉| 27123/27176 [1:36:28<00:11, 4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 119: 0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.530
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08, 4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08, 4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
`Trainer.fit` stopped: `max_epochs=120` reached.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08, 4.69it/s, loss=1.32, v_num=1, tr
Training loop complete.
Training finished successfully
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: Unknown Error
Execution status: PASS
2024-04-12 06:34:08,324 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.
Unable to see the .tlt file in the directory.
also when I restarted the training again after the above issue then I am getting error after 1 epoch is “{“date”: “4/12/2024”, “time”: “8:51:0”, “status”: “FAILURE”, “verbosity”: “INFO”, “message”: “Error: all query identities do not appear in gallery.”}”
Please help where is the problem.
Thanks.
1 post - 1 participant