I am trying to install the TAO autoML kubernetes cluster on Azure.
After running bash setup.sh install
I am getting this error:
│ Error: release tao-toolkit-api failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
│
│ with helm_release.tao_toolkit_api,
│ on api-config.tf line 145, in resource "helm_release" "tao_toolkit_api":
│ 145: resource "helm_release" "tao_toolkit_api" {
│
╵
╷
│ Error: StatefulSet default/nvidia-smi is not finished rolling out
│
│ with kubernetes_stateful_set_v1.nvidia_smi,
│ on gpu-operator.tf line 147, in resource "kubernetes_stateful_set_v1" "nvidia_smi":
│ 147: resource "kubernetes_stateful_set_v1" "nvidia_smi" {
• Hardware (T4c)
• TLT Version: toolkit_version: 5.0.0
I have a long log from the terminal but I do not know what is useful to paste here for help. But let me know what can help and I will paste it.
Thanks
1 post - 1 participant