Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

Not generating PersonREID output tlt file even after training is finished

$
0
0

Dear @Morganh or Team,

I am using TAO-5 and trained model using re-identification net sample.

I used market-net dataset and also include different custom data-source as well.

My training got complete after 2 weeks. But at end I am unable to see the .tlt file. can you please suggest what is the gap and where is the gap in training.

below is training configuration

results_dir: "/results"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/model/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/data/sample_train"
  test_dataset_dir: "/data/sample_test"
  query_dataset_dir: "/data/sample_query"
  num_classes: 100
  batch_size: 64
  val_batch_size: 128
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 10

Below is the training logs.

Train model
2024-04-03 20:11:37,331 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-04-03 20:11:37,383 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2024-04-03 20:11:37,436 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-04-03 20:11:37,436 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
sys:1: UserWarning: 
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning: 
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /results/market1501/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   13526 │    1820258 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │     793 │       2347 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │     100 │       1779 │           6 │
╘══════════╧═════════╧════════════╧═════════════╛
<frozen core.loggers.api_logging>:245: UserWarning: Log file already exists at /results/market1501/train/status.json
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /results/market1501/train exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

  | Name           | Type     | Params
--------------------------------------------
0 | model          | Baseline | 27.5 M
1 | train_accuracy | Accuracy | 0     
2 | val_accuracy   | Accuracy | 0     
--------------------------------------------
27.5 M    Trainable params
256       Non-trainable params
27.5 M    Total params
109.983   Total estimated model params size (MB)
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0:   0%|                                        | 0/27176 [00:00<?, ?it/s]

Epoch 0: 100%|█████▉| 27154/27176 [2:15:05<00:06,  3.35it/s, loss=1.34, v_num=1]Train and Val metrics generated.
Epoch 0: 100%|▉| 27154/27176 [2:15:05<00:06,  3.35it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 1:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=5.430, 

Epoch 1: 100%|▉| 27174/27176 [1:41:20<00:00,  4.47it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 1: 100%|▉| 27174/27176 [1:41:21<00:00,  4.47it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 2:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260, 

Epoch 2: 100%|▉| 27167/27176 [1:42:21<00:02,  4.42it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 2: 100%|▉| 27167/27176 [1:42:22<00:02,  4.42it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 3:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.870, 

Epoch 3: 100%|▉| 27141/27176 [1:43:35<00:08,  4.37it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 3: 100%|▉| 27141/27176 [1:43:36<00:08,  4.37it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 4:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.730, 

Epoch 4: 100%|▉| 27126/27176 [1:44:43<00:11,  4.32it/s, loss=1.35, v_num=1, traiTrain and Val metrics generated.
Epoch 4: 100%|▉| 27126/27176 [1:44:44<00:11,  4.32it/s, loss=1.35, v_num=1, traiTraining loop in progress
Epoch 5:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.670, 

Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09,  4.37it/s, loss=1.31, v_num=1, traiTrain and Val metrics generated.
Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09,  4.37it/s, loss=1.31, v_num=1, traiTraining loop in progress
Epoch 6:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=2.630, 

Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03,  4.38it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03,  4.38it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 7:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.610, 

Epoch 7: 100%|█| 27176/27176 [1:44:59<00:00,  4.31it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 7: 100%|█| 27176/27176 [1:45:00<00:00,  4.31it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 8:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.600, 

Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09,  4.17it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09,  4.17it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 9:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.590, 

Epoch 9: 100%|▉| 27152/27176 [1:50:43<00:05,  4.09it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 9: 100%|▉| 27152/27176 [1:50:44<00:05,  4.09it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 10:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,

Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04,  4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04,  4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 11:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.600,

Epoch 11: 100%|▉| 27131/27176 [1:46:14<00:10,  4.26it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 11: 100%|▉| 27131/27176 [1:46:15<00:10,  4.26it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 12:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,

Epoch 12: 100%|█| 27176/27176 [1:48:30<00:00,  4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 12: 100%|█| 27176/27176 [1:48:31<00:00,  4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 13:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.580,

Epoch 13: 100%|█| 27176/27176 [1:41:17<00:00,  4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 13: 100%|█| 27176/27176 [1:41:18<00:00,  4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 14:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.570,

Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06,  4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06,  4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 15:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.570,

Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03,  4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03,  4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 16:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,

Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 17:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,

Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 18:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04,  4.59it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04,  4.59it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 19:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.570,

Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05,  4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05,  4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 20:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,

Epoch 20: 100%|▉| 27147/27176 [1:38:46<00:06,  4.58it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 20: 100%|▉| 27147/27176 [1:38:47<00:06,  4.58it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 21:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.560,

Epoch 21: 100%|▉| 27139/27176 [1:38:55<00:08,  4.57it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 21: 100%|▉| 27139/27176 [1:38:56<00:08,  4.57it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 22:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,

Epoch 22: 100%|▉| 27128/27176 [1:48:27<00:11,  4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 22: 100%|▉| 27128/27176 [1:48:28<00:11,  4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 23:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03,  4.31it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03,  4.31it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 24:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 24: 100%|▉| 27162/27176 [1:48:23<00:03,  4.18it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 24: 100%|▉| 27162/27176 [1:48:24<00:03,  4.18it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 25:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 25: 100%|▉| 27148/27176 [1:51:51<00:06,  4.04it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 25: 100%|▉| 27148/27176 [1:51:52<00:06,  4.04it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 26:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,

Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02,  3.93it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02,  3.93it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 27:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 27: 100%|▉| 27137/27176 [1:39:42<00:08,  4.54it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 27: 100%|▉| 27137/27176 [1:39:43<00:08,  4.54it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 28:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 29:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,

Epoch 29: 100%|▉| 27123/27176 [1:39:59<00:11,  4.52it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 29: 100%|▉| 27123/27176 [1:40:00<00:11,  4.52it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 30:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 30: 100%|▉| 27165/27176 [1:39:39<00:02,  4.54it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 30: 100%|▉| 27165/27176 [1:39:40<00:02,  4.54it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 31:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,

Epoch 31: 100%|▉| 27153/27176 [1:39:17<00:05,  4.56it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 31: 100%|▉| 27153/27176 [1:39:18<00:05,  4.56it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 32:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 32: 100%|▉| 27153/27176 [1:39:11<00:05,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 32: 100%|▉| 27153/27176 [1:39:12<00:05,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 33:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.550,

Epoch 33: 100%|█| 27176/27176 [1:39:22<00:00,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 33: 100%|█| 27176/27176 [1:39:23<00:00,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 34:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 34: 100%|▉| 27156/27176 [1:39:18<00:04,  4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 34: 100%|▉| 27156/27176 [1:39:19<00:04,  4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 35:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.550,

Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01,  4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01,  4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 36:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,

Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01,  4.50it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01,  4.50it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 37:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,

Epoch 37: 100%|▉| 27160/27176 [1:39:05<00:03,  4.57it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 37: 100%|▉| 27160/27176 [1:39:06<00:03,  4.57it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 38:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,

Epoch 38: 100%|▉| 27131/27176 [1:48:40<00:10,  4.16it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 38: 100%|▉| 27131/27176 [1:48:41<00:10,  4.16it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 39:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.550,

Epoch 39: 100%|▉| 27155/27176 [1:54:27<00:05,  3.95it/s, loss=1.38, v_num=1, traTrain and Val metrics generated.
Epoch 39: 100%|▉| 27155/27176 [1:54:28<00:05,  3.95it/s, loss=1.38, v_num=1, traTraining loop in progress
Epoch 40:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.38, v_num=1, train_loss=5.030,

Epoch 40: 100%|▉| 27146/27176 [1:52:45<00:07,  4.01it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 40: 100%|▉| 27146/27176 [1:52:46<00:07,  4.01it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 41:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=3.870,

Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 42:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.700,

Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 43:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,

Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00,  4.60it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00,  4.60it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 44:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.510,

Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02,  4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02,  4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 45:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.450,

Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 46:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.410,

Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03,  4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03,  4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 47:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.380,

Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 48:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.360,

Epoch 48: 100%|▉| 27140/27176 [1:38:35<00:07,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 48: 100%|▉| 27140/27176 [1:38:36<00:07,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 49:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.350,

Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06,  4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06,  4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 50:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.330,

Epoch 50: 100%|▉| 27169/27176 [1:39:07<00:01,  4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 50: 100%|▉| 27169/27176 [1:39:08<00:01,  4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 51:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.320,

Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 52:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.310,

Epoch 52: 100%|▉| 27162/27176 [1:39:16<00:03,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 52: 100%|▉| 27162/27176 [1:39:17<00:03,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 53:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.320,

Epoch 53: 100%|▉| 27175/27176 [1:39:33<00:00,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 53: 100%|▉| 27175/27176 [1:39:34<00:00,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 54:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.300,

Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00,  4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00,  4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 55:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,

Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06,  4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06,  4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 56:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,

Epoch 56: 100%|▉| 27166/27176 [1:39:18<00:02,  4.56it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 56: 100%|▉| 27166/27176 [1:39:19<00:02,  4.56it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 57:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.290,

Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 58:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,

Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 59:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,

Epoch 59: 100%|▉| 27145/27176 [1:39:07<00:06,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 59: 100%|▉| 27145/27176 [1:39:08<00:06,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 60:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,

Epoch 60: 100%|▉| 27173/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 60: 100%|▉| 27173/27176 [1:39:12<00:00,  4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 61:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,

Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08,  4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08,  4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 62:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,

Epoch 62: 100%|▉| 27137/27176 [1:38:49<00:08,  4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 62: 100%|▉| 27137/27176 [1:38:50<00:08,  4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 63:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,

Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03,  4.58it/s, loss=1.31, v_num=1, traTrain and Val metrics generated.
Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03,  4.58it/s, loss=1.31, v_num=1, traTraining loop in progress
Epoch 64:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.280,

Epoch 64: 100%|▉| 27152/27176 [1:41:18<00:05,  4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 64: 100%|▉| 27152/27176 [1:41:19<00:05,  4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 65:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.280,

Epoch 65: 100%|▉| 27148/27176 [1:45:04<00:06,  4.31it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 65: 100%|▉| 27148/27176 [1:45:05<00:06,  4.31it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 66:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.270,

Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13,  3.78it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13,  3.78it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 67:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.270,

Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08,  4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08,  4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 68:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,

Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03,  4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03,  4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 69:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,

Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05,  3.25it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05,  3.25it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 70:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.890,

Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07,  4.63it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07,  4.63it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 71:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.490,

Epoch 71: 100%|▉| 27153/27176 [1:36:59<00:04,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 71: 100%|▉| 27153/27176 [1:37:00<00:04,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 72:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.240,

Epoch 72: 100%|▉| 27160/27176 [1:37:01<00:03,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 72: 100%|▉| 27160/27176 [1:37:02<00:03,  4.66it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 73:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.070,

Epoch 73: 100%|▉| 27147/27176 [1:36:54<00:06,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 73: 100%|▉| 27147/27176 [1:36:55<00:06,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 74:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.950,

Epoch 74: 100%|█| 27176/27176 [1:36:52<00:00,  4.68it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 74: 100%|█| 27176/27176 [1:36:53<00:00,  4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 75:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.870,

Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13,  4.67it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13,  4.67it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 76:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.810,

Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05,  4.67it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05,  4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 77:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.770,

Epoch 77: 100%|▉| 27142/27176 [1:36:48<00:07,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 77: 100%|▉| 27142/27176 [1:36:49<00:07,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 78:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.740,

Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08,  4.68it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08,  4.68it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 79:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.710,

Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 80:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.700,

Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08,  4.53it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08,  4.53it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 81:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.680,

Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01,  4.51it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01,  4.51it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 82:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.670,

Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00,  4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00,  4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 83:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.660,

Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00,  4.61it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00,  4.61it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 84:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.650,

Epoch 84: 100%|█| 27176/27176 [1:37:47<00:00,  4.63it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 84: 100%|█| 27176/27176 [1:37:48<00:00,  4.63it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 85:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.640,

Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09,  4.62it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09,  4.62it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 86:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.640,

Epoch 86: 100%|▉| 27165/27176 [1:38:11<00:02,  4.61it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 86: 100%|▉| 27165/27176 [1:38:12<00:02,  4.61it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 87:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.630,

Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05,  4.64it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05,  4.64it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 88:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.620,

Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08,  4.64it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08,  4.64it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 89:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.620,

Epoch 89: 100%|▉| 27164/27176 [1:37:39<00:02,  4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 89: 100%|▉| 27164/27176 [1:37:40<00:02,  4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 90:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05,  4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05,  4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 91:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 91: 100%|▉| 27172/27176 [1:37:39<00:00,  4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 91: 100%|▉| 27172/27176 [1:37:40<00:00,  4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 92:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00,  4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00,  4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 93:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.600,

Epoch 93: 100%|█| 27176/27176 [1:40:02<00:00,  4.53it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 93: 100%|█| 27176/27176 [1:40:03<00:00,  4.53it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 94:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.600,

Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09,  4.18it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09,  4.18it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 95:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.600,

Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00,  4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00,  4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 96:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.590,

Epoch 96: 100%|▉| 27156/27176 [1:41:28<00:04,  4.46it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 96: 100%|▉| 27156/27176 [1:41:29<00:04,  4.46it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 97:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.590,

Epoch 97: 100%|█| 27176/27176 [1:42:49<00:00,  4.40it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 97: 100%|█| 27176/27176 [1:42:50<00:00,  4.40it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 98:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,

Epoch 98: 100%|▉| 27161/27176 [1:38:29<00:03,  4.60it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 98: 100%|▉| 27161/27176 [1:38:30<00:03,  4.60it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 99:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580,

Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00,  4.68it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00,  4.68it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 100:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580

Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 101:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.580

Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 102:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 103:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 103: 100%|▉| 27169/27176 [1:36:45<00:01,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 103: 100%|▉| 27169/27176 [1:36:46<00:01,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 104:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 105:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 105: 100%|▉| 27152/27176 [1:36:45<00:05,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 105: 100%|▉| 27152/27176 [1:36:46<00:05,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 106:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 107:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 107: 100%|▉| 27125/27176 [1:38:34<00:11,  4.59it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 107: 100%|▉| 27125/27176 [1:38:35<00:11,  4.59it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 108:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00,  4.63it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00,  4.63it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 109:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.550

Epoch 109: 100%|▉| 27162/27176 [1:37:38<00:03,  4.64it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 109: 100%|▉| 27162/27176 [1:37:39<00:03,  4.64it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 110:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550

Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00,  4.49it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00,  4.49it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 111:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.540

Epoch 111: 100%|▉| 27148/27176 [1:44:13<00:06,  4.34it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 111: 100%|▉| 27148/27176 [1:44:14<00:06,  4.34it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 112:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550

Epoch 112: 100%|▉| 27151/27176 [1:43:13<00:05,  4.38it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 112: 100%|▉| 27151/27176 [1:43:14<00:05,  4.38it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 113:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00,  4.71it/s, loss=1.35, v_num=1, trTrain and Val metrics generated.
Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00,  4.71it/s, loss=1.35, v_num=1, trTraining loop in progress
Epoch 114:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.540

Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01,  4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01,  4.70it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 115:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00,  4.70it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00,  4.70it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 116:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.540

Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06,  4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06,  4.69it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 117:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 117: 100%|█| 27176/27176 [1:36:37<00:00,  4.69it/s, loss=1.31, v_num=1, trTrain and Val metrics generated.
Epoch 117: 100%|█| 27176/27176 [1:36:38<00:00,  4.69it/s, loss=1.31, v_num=1, trTraining loop in progress
Epoch 118:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.530

Epoch 118: 100%|▉| 27123/27176 [1:36:27<00:11,  4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 118: 100%|▉| 27123/27176 [1:36:28<00:11,  4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 119:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.530

Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
`Trainer.fit` stopped: `max_epochs=120` reached.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, tr
Training loop complete.
Training finished successfully
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: Unknown Error
Execution status: PASS
2024-04-12 06:34:08,324 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Unable to see the .tlt file in the directory.

also when I restarted the training again after the above issue then I am getting error after 1 epoch is “{“date”: “4/12/2024”, “time”: “8:51:0”, “status”: “FAILURE”, “verbosity”: “INFO”, “message”: “Error: all query identities do not appear in gallery.”}”

Please help where is the problem.

Thanks.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles