TAO Toolkit Version 5.3 - Segformer ValueError: need at least one array to concatenate

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) A6000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 5.3
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I created a new python env and installed the TAO 5.3 Launcher. Ran exactly the same model as for 5.2 (same mounts, same dataset and same spec file) but it fails with:

Looking at the errors and tracing it through the mmseg and NVIDA-TAO sources on GitHub it fails because a call to Numpy’s concatenate fails. This is usually due to an array that is passed being empty. The failure appears to be around where datasets are loaded. But I have not changed any datasets or the spec file. As a check I repointed my notebook to TAO V5.2 kernel and reran without any errors.

I know that 5.3 is new so wondering if this has come up yet?

Cheers

Error below.

/usr/local/lib/python3.10/dist-packages/mmseg/engine/hooks/visualization_hook.py:60: UserWarning: The draw is False, it means that the hook for visualization will not take effect. The results will NOT be visualized or stored.
warnings.warn('The draw is False, it means that the ’
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/dataloader/loading.py:53: UserWarning: reduce_zero_label will be deprecated, if you would like to ignore the zero label, please set reduce_zero_label=True when dataset initialized
warnings.warn('reduce_zero_label will be deprecated, ’
Error executing job with overrides: [‘train.num_gpus=2’, ‘results_dir=/workspace/tao-experiments/results/Ex4’]
An error occurred during Hydra’s exception formatting:
AssertionError()
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 254, in run_and_report
assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 123, in
main()
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/hydra/hydra_runner.py”, line 107, in wrapper
_run_hydra(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 389, in _run_hydra
_run_app(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 452, in _run_app
run_and_report(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 296, in run_and_report
raise ex
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 213, in run_and_report
return func()
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 453, in
lambda: hydra.run(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py”, line 132, in run
_ = ret.return_value
File “/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py”, line 260, in return_value
raise self._return_value
File “/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py”, line 186, in run_job
ret.return_value = task_function(task_cfg)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 119, in main
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 106, in main
run_experiment(experiment_config=cfg,
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 85, in run_experiment
runner.train()
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1728, in train
self._train_loop = self.build_train_loop(
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1520, in build_train_loop
loop = LOOPS.build(
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py”, line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py”, line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py”, line 219, in init
super().init(runner, dataloader)
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/base_loop.py”, line 26, in init
self.dataloader = runner.build_dataloader(
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1370, in build_dataloader
dataset = DATASETS.build(dataset_cfg)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py”, line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py”, line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File “/usr/local/lib/python3.10/dist-packages/mmseg/datasets/basesegdataset.py”, line 142, in init
self.full_init()
File “/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py”, line 307, in full_init
self.data_bytes, self.data_address = self._serialize_data()
File “/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py”, line 768, in _serialize_data
data_bytes = np.concatenate(data_list)
File “<array_function internals>”, line 200, in concatenate
ValueError: need at least one array to concatenate
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/dataloader/loading.py:53: UserWarning: reduce_zero_label will be deprecated, if you would like to ignore the zero label, please set reduce_zero_label=True when dataset initialized
warnings.warn('reduce_zero_label will be deprecated, ’
need at least one array to concatenate
Error executing job with overrides: [‘train.num_gpus=2’, ‘results_dir=/workspace/tao-experiments/results/Ex4’]
An error occurred during Hydra’s exception formatting:
AssertionError()
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 254, in run_and_report
assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 123, in
main()
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/hydra/hydra_runner.py”, line 107, in wrapper
_run_hydra(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 389, in _run_hydra
_run_app(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 452, in _run_app
run_and_report(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 296, in run_and_report
raise ex
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 213, in run_and_report
return func()
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py”, line 453, in
lambda: hydra.run(
File “/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py”, line 132, in run
_ = ret.return_value
File “/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py”, line 260, in return_value
raise self._return_value
File “/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py”, line 186, in run_job
ret.return_value = task_function(task_cfg)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 119, in main
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 106, in main
run_experiment(experiment_config=cfg,
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 85, in run_experiment
runner.train()
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1728, in train
self._train_loop = self.build_train_loop(
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1520, in build_train_loop
loop = LOOPS.build(
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py”, line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py”, line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py”, line 219, in init
super().init(runner, dataloader)
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/base_loop.py”, line 26, in init
self.dataloader = runner.build_dataloader(
File “/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py”, line 1370, in build_dataloader
dataset = DATASETS.build(dataset_cfg)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py”, line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File “/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py”, line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File “/usr/local/lib/python3.10/dist-packages/mmseg/datasets/basesegdataset.py”, line 142, in init
self.full_init()
File “/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py”, line 307, in full_init
self.data_bytes, self.data_address = self._serialize_data()
File “/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py”, line 768, in _serialize_data
data_bytes = np.concatenate(data_list)
File “<array_function internals>”, line 200, in concatenate
ValueError: need at least one array to concatenate
[2024-04-03 22:41:25,154] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 404) of binary: /usr/bin/python
Traceback (most recent call last):
File “/usr/local/bin/torchrun”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py”, line 351, in wrapper
return f(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py”, line 806, in main
run(args)
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py”, line 797, in run
elastic_launch(
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py”, line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py”, line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

3 posts - 2 participants

Read full topic

TAO Toolkit Version 5.3 - Segformer ValueError: need at least one array to concatenate

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112