Quantcast
Viewing all articles
Browse latest Browse all 545

Handling Multi-Person Pose Data for PoseClassificationNet Fine-Tuning

I am working on fine-tuning the PoseClassificationNet within the Pose Classification pipeline, and I need guidance on handling multi-person scenarios in video clips during dataset preparation.

Current Workflow:

For single-person action videos, my data processing steps are as follows:

  1. Extract clips from videos with diverse poses and viewpoints.
  2. Run the BodyPose3D model to generate JSON metadata.
  3. Convert 3D points to 2D keypoints.
  4. Convert JSON metadata to NumPy arrays (per video).
  5. Save .pkl files containing the video’s keypoints and corresponding action labels.
  6. Merge arrays and split them into Train, Validation, and Test sets.

Concern:

For videos containing two or more persons performing the same action, I would like clarity on:

  • How should I handle the keypoints of each unique person in a video?
  • What should the NumPy array format look like to support multiple persons?
  • Should I create one combined .npy file per video containing all persons, or separate .npy files per person?
  • How do I assign the correct action label if there are multiple people in a single clip?
  • What is the best practice for splitting Train/Val/Test when multiple persons are present in one video?
  • Are there any NVIDIA-recommended guidelines for handling multi-person action clips in the PoseClassificationNet dataset pipeline?

Additional Context:

  • I am currently following the dataset preparation documentation designed for single-person videos but would like to scale this to handle multi-person cases while preserving the action context.
  • If there are any reference implementations, sample datasets, or scripts for multi-person handling in PoseClassificationNet, please do share.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 545

Trending Articles