Facenet question

Please provide the following information when requesting support.

Hardware: T4
Network type: Facenet, FPEnet

I’m having ported parts of the TAO facial landmarks sample app to GOLANG deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

I’m not interested in annotating video with facial landmarks. I’m interested in “fingerprinting” faces in order to be able to recognize them again

My input is an RTSP video 30 fps HD
A DeepStream 7 pipeline consisting of the primary detector running Facenet and a secondary detector running FPENet
Detected faces are square aligned (as the CPP app does it), so that W/H of the crop is identical, before it goes into the secondary model
Since resolution varies, I’m normalizing the landmark coordinates to the current resolution
In a training process I’m using 10 seconds of video from non moving faces (to not have to deal with changing landmark positions) over time
This finally gives me a “fingerprint” for a person, consisting of the 80 X/Y float landmarks, averaged from the taken images in time, considering the confidence
This can be repeated with different poses to have more fingerprints per person
Results are stored into a database

Now it comes:

In the recognition process I’m calculating the euclidian distance of landmark points (separately for chin, eyebrows, eyes, etc) and average this finally to a “distance” value between a stored fingerprint and the current test fingerprint (the current landmark tensor)
If the distance for a given database entry is below a certain threshold, I consider this as “recognised person”.
This unfortunately is giving ambiguous results (means: I hold my nose into the video and the other person is detected).

I have gathered some experiences with DLIB and I thought I have learned they would also use the geometric distance for facial landmarks, but honestly I’m not sure to be on the right track here. Especially if I’m - except that I’m making sure the face image to have the same width/ height - not doing any attempts to “morph” or “flatten” a face image, in case it is not exactly a frontal face shot.

Is there any information, how to use the facial landmarks as the come out of FPENet for Recognition again?

Makes sense or is there anything else to do?

19 posts - 2 participants

Read full topic