You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When detect_video() finds multiple faces, they do not appear to have a consistent ordering with respect to their position in the video.
For example, in a recorded video call between two people, where one speaker is in a box on the left and the other speaker in a box on the right, each frame index has two rows in the resulting dataframe from detect_video(), one for each face. But sometimes the left speaker is the first entry in that frame index, and sometimes the second. This is apparent from the FaceRectX value.
For a two-speaker video call, it's easy enough to group by index and order by X value (and multi-speaker calls could do the same thing using both X and Y); maybe consider putting a note in the docs stating that order isn't guaranteed?
This issue is related to #198, although it's simpler for the video call use case, as heads aren't moving around much and so it doesn't require a latent representation to keep track.
The text was updated successfully, but these errors were encountered:
When
detect_video()
finds multiple faces, they do not appear to have a consistent ordering with respect to their position in the video.For example, in a recorded video call between two people, where one speaker is in a box on the left and the other speaker in a box on the right, each frame index has two rows in the resulting dataframe from
detect_video()
, one for each face. But sometimes the left speaker is the first entry in that frame index, and sometimes the second. This is apparent from theFaceRectX
value.For a two-speaker video call, it's easy enough to group by index and order by X value (and multi-speaker calls could do the same thing using both X and Y); maybe consider putting a note in the docs stating that order isn't guaranteed?
This issue is related to #198, although it's simpler for the video call use case, as heads aren't moving around much and so it doesn't require a latent representation to keep track.
The text was updated successfully, but these errors were encountered: