I am trying to build the base docker and it fails
(sdgpose) mona@ada:/data/tao_pytorch_backend/docker$ ./build.sh --build
Building base docker ...
[+] Building 5.0s (10/29) docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.73kB 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:23.12-py3 2.6s
=> [auth] nvidia/pytorch:pull,push token for nvcr.io 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 230B 0.0s
=> [ 1/24] FROM nvcr.io/nvidia/pytorch:23.12-py3@sha256:da3d1b690b9dca1fbf9beb3506120a63479e0cf1dc69c9256055125460eb44f7 0.0s
=> CACHED [ 2/24] COPY docker/requirements-apt.txt requirements-apt.txt 0.0s
=> CACHED [ 3/24] RUN apt-get upgrade && apt-get update && xargs apt-get install -y < requirements-apt.txt && rm requirements-apt.txt && rm -rf /var/lib/apt/lists/* 0.0s
=> CACHED [ 4/24] RUN pip uninstall -y sacrebleu torchtext 0.0s
=> ERROR [ 5/24] RUN pip install parametrized ninja 2.2s
------
> [ 5/24] RUN pip install parametrized ninja:
0.393 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
0.985 Collecting parametrized
1.282 Downloading parametrized-66.0.3.tar.gz (1.2 kB)
1.286 Preparing metadata (setup.py): started
1.460 Preparing metadata (setup.py): finished with status 'done'
1.461 Requirement already satisfied: ninja in /usr/local/lib/python3.10/dist-packages (1.11.1.1)
1.466 Building wheels for collected packages: parametrized
1.466 Building wheel for parametrized (setup.py): started
1.678 Building wheel for parametrized (setup.py): finished with status 'error'
1.684 error: subprocess-exited-with-error
1.684
1.684 × python setup.py bdist_wheel did not run successfully.
1.684 │ exit code: 1
1.684 ╰─> [47 lines of output]
1.684 /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py:265: UserWarning: Unknown distribution option: 'readme'
1.684 warnings.warn(msg)
1.684 running bdist_wheel
1.684 running build
1.684 /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
1.684 !!
1.684
1.684 ********************************************************************************
1.684 Please avoid running ``setup.py`` directly.
1.684 Instead, use pypa/build, pypa/installer or other
1.684 standards-based tools.
1.684
1.684 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
1.684 ********************************************************************************
1.684
1.684 !!
1.684 self.initialize_options()
1.684 installing to build/bdist.linux-x86_64/wheel
1.684 running install
1.684 Traceback (most recent call last):
1.684 File "<string>", line 2, in <module>
1.684 File "<pip-setuptools-caller>", line 34, in <module>
1.684 File "/tmp/pip-install-1oy0uqz6/parametrized_1b64a7f7c3b14ba096077f166889576c/setup.py", line 10, in <module>
1.684 setup(
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 103, in setup
1.684 return distutils.core.setup(**attrs)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
1.684 return run_commands(dist)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
1.684 dist.run_commands()
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
1.684 self.run_command(cmd)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
1.684 super().run_command(command)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
1.684 cmd_obj.run()
1.684 File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 403, in run
1.684 self.run_command("install")
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
1.684 self.distribution.run_command(command)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
1.684 super().run_command(command)
1.684 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
1.684 cmd_obj.run()
1.684 File "/tmp/pip-install-1oy0uqz6/parametrized_1b64a7f7c3b14ba096077f166889576c/setup.py", line 7, in run
1.684 raise RuntimeError("You are trying to install a stub package parametrized. Maybe you are using the wrong pypi?")
1.684 RuntimeError: You are trying to install a stub package parametrized. Maybe you are using the wrong pypi?
1.684 [end of output]
1.684
1.684 note: This error originates from a subprocess, and is likely not a problem with pip.
1.684 ERROR: Failed building wheel for parametrized
1.685 Running setup.py clean for parametrized
1.810 Failed to build parametrized
1.810 ERROR: Could not build wheels for parametrized, which is required to install pyproject.toml-based projects
2.165
2.165 [notice] A new release of pip is available: 23.3.1 -> 24.0
2.165 [notice] To update, run: python -m pip install --upgrade pip
------
Dockerfile:14
--------------------
12 | # uninstall stuff from base container
13 | RUN pip uninstall -y sacrebleu torchtext
14 | >>> RUN pip install parametrized ninja
15 | # Installing custom packages in /opt.
16 | WORKDIR /opt
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install parametrized ninja" did not complete successfully: exit code: 1
Describe the bug
A clear and concise description of what the bug is.
Steps/Code to reproduce bug
(sdgpose) mona@ada:/data/tao_pytorch_backend/docker$ git log
commit 9c2d94c0635b1117edfea85a94a6e3d0ead53754 (HEAD -> main, origin/main, origin/HEAD)
Author: Arun George Zachariah <azachariah@nvidia.com>
Date: Fri Mar 8 17:18:29 2024 -0800
TAO 5.3 Release - PyTorch
Please list minimal steps or code snippet for us to be able to reproduce the bug.
A helpful guide on on how to craft a minimal bug report Craft Minimal Bug Reports.
Expected behavior
A clear and concise description of what you expected to happen.
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
- Method of TAO Toolkit Installation install: [docker container, launcher, pip install or from source]. Please specify exact commands you used to install.
- If method of install is [Docker], provide
docker pull
&docker run
commands used - If method of install in [Launcher], provide the output of the
tao info --verbose
command andpip show nvidia-tao
command.
(sdgpose) mona@ada:/data/tao_pytorch_backend/docker$ tao info --verbose
Configuration of the TAO Toolkit Instance
task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.0.0-tf2.11.0:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.2.0-pyt2.1.0:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. deformable_detr
4. dino
5. mal
6. ml_recog
7. ocdnet
8. ocrnet
9. optical_inspection
10. pointpillars
11. pose_classification
12. re_identification
13. visual_changenet
5.2.0.1-pyt1.14.0:
docker_registry: nvcr.io
tasks:
1. classification_pyt
2. segformer
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.2.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.2.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. lprnet
14. mask_rcnn
15. ml_recog
16. multitask_classification
17. ocdnet
18. ocrnet
19. optical_inspection
20. retinanet
21. segformer
22. ssd
23. trtexec
24. unet
25. yolo_v3
26. yolo_v4
27. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.2.0.1
published_date: 01/16/2024
(sdgpose) mona@ada:/data/tao_pytorch_backend/docker$ pip show nvidia-tao
Name: nvidia-tao
Version: 5.2.0.1
Summary: NVIDIA's Launcher for TAO Toolkit.
Home-page:
Author: Varun Praveen
Author-email: vpraveen@nvidia.com
License: NVIDIA Proprietary Software
Location: /home/mona/anaconda3/envs/sdgpose/lib/python3.10/site-packages
Requires: certifi, chardet, docker, docker-pycreds, idna, requests, rich, six, tabulate, urllib3, websocket-client
Required-by:
Environment details
If NVIDIA docker image is used you don’t need to specify these.
Otherwise, please provide:
- OS version
(base) mona@ada:~$ uname -a
Linux ada 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
(base) mona@ada:~$ lsb_release -a
LSB Version: core-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
- TensorFlow version
- Python version
(sdgpose) mona@ada:/data/tao_pytorch_backend/docker$ python
Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux
- CUDA version
(base) mona@ada:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
- CUDNN version
- DALI version
- GPU Driver
Additional context
Add any other context about the problem here.
Example: GPU model
Please provide the following information when requesting support.
• Hardware (ThinkStation P7 – Ada 6000 RTX)
• Network Type (CenterPose)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
2 posts - 1 participant