Quantcast
Channel: TAO Toolkit - NVIDIA Developer Forums
Viewing all articles
Browse latest Browse all 497

TAO API 5.3 : How to create experiments that leverage pretrained base_experiments from NGC?

$
0
0

Hello,

  • I have TAO API 5.3 deployed & running on an AWS EKS instance with T4 GPUs.

  • I’m following TAO API Object Detection workflow notebook to leverage the API for object detection tasks, making changes to train a custom yolo_v4 using backbone pretrained_object_detection:resnet18 .

  • I follow all steps as in the notebook – create experiment, upload & convert & assign datasets, assign PTM to experiment.

  • I assign the PTM nvidia/tao/pretrained_object_detection:resnet18 , which has experiment ID 45a2921e-064d-58d3-9579-17f5326ba5db , to my new experiment.

  • This all works fine via the API so far.

However, when I make an API call to start training and view the training job’s logs, I see in the logs:

Base Experiment file for ID 45a2921e-064d-58d3-9579-17f5326ba5db is not found
  • This ID is the PTM ID corresponding to NGC PTM nvidia/tao/pretrained_object_detection:resnet18. When I retrieve this base experiment’s JSON and examine its fields, I notice:
 base_experiment: []
 base_experiment_pull_complete: "not_present"

Before training, i’ve fetched my custom-created experiment’s spec & confirmed that my experiment has the resnet18 PTM assigned, i.e :

 base_experiment: ['45a2921e-064d-58d3-9579-17f5326ba5db']

Question #1 : Do I have to somehow pull the resnet18 backbone from NGC registry to my AWS server (where TAO API is deployed), in order for training to work? I know in the docker container version of TAO you can download via ngc registry model download-version nvidia/tao/pretrained_object_detection:resnet18 – is there an equivalent API call I must do, to make the resnet18 backbone weights available to user-created experiments?

Question #2 : My understand is that I’d need to repeat this for every pretrained model we’d like to use from NGC – is this correct?

  • Also, I tried using the /experiments:base API endpoint to “List Experiments that can be used for transfer learning”, and want to see if the resnet18 backbone is present there. But this endpoint gives a 404 error.

  • Endpoint is https://<TAO_IP>/tao-api/api/v1/users/<user_id>/experiments:base

  • Error response is:

<Response [404]>

{'code': 404, 'name': 'Not Found', 'description': 
'The requested URL was not found on the server. 
If you entered the URL manually please check your spelling and try again.'}

Question #3 : Is this a known bug with the /experiments:base API endpoint and/or is this endpoint still supported in TAO 5.3?

Many thanks!!

2 posts - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 497

Trending Articles