Introduction

I created a program to track people using OpenVINO's Person-reidentification model. You can also count the comings and goings of people depending on the direction of travel. I had a hard time, so I will summarize the implemented contents.

The image is a YouTube link [] (https://youtu.be/Pj6HYWWyucU) The video used for the test was downloaded from Videezy. Since it is a 4K video, it was reduced to 1920 x 1080 on this site.

The code has been uploaded to ** GitHub **.

Intel Pedestrian Tracker Demo

In the first place, Intel's ** Pedestrian Tracker Demo ** is a video that tracks people moving on an escalator YouTube (Note: there is a sound) You can see it at .com / watch? V = OKH57mvO9k0).

It looks like it was tested with the same video (1920 x 1080) as the Intel demo [^ 1]. The tracking accuracy is worse than the original one (my implementation problem), but I think it can be reproduced reasonably well.

The image is a YouTube link. The original video can be downloaded [^ 2] from Pexels Videos. [] (https://youtu.be/zIkzlB-Z-vU)

[^ 1]: If you have OpenVINO installed, you can find the program in [OpenVINO Install Path] \ openvino \ deployment_tools \ open_model_zoo \ demos \ pedestrian_tracker_demo. The demo program is written in C ++. I tried to reproduce it with Python, but I decided that it was impossible, so the content of this article is something that somehow imitated the mechanism. Therefore, the implementation is different from the original demo program.

[^ 2]: You can find it by searching on the escalator.

environment

Python 3.7.6
Windows 10 CPU Intel(R) Core(TM) i5-7200
OpenVINO Toolkit 2020.4.117

Where it was difficult

Humans can easily unknowingly (in continuous time) recognize the "same person", but it is not easy to get a computer to do it. We need a mechanism to recognize the detected "person" as the "same person" between consecutive frames.

Use Intel's person-reidentification model to recognize it as the same person. The basic usage is the same as the face rerecognition face-reidentification [^ 3].

Since this model outputs the feature vector of the detected person, the identity can be judged from the cosine similarity between multiple vectors, but it is tough in re-recognition such as ** fast movement, intense, overlapping people **. It is necessary to clear the situation.

Here is the result of simply judging the cosine similarity as a threshold. If people overlap, "tracking" will be lost and you will be unintentionally registered as a different person. It seems that it can be solved by lowering the threshold and lowering the condition to judge as the same person, but this time, "tracking" does not work well because another person is treated as the same person.

[^ 3]: For details, please refer to Face re-identification with OpenVINO.

Implementation points

The points that I thought were the points of implementation are described below. These are based on Intel's demo program. [^ 4]

[^ 4]: In the demo program, it seems that the Bounding Box's "shape", "movement", and "time" are used to judge the identity, and a mechanism for "forgetting" is also implemented, making a more detailed judgment. I will.

** 1. Cosine similarity threshold ** -If the cosine similarity is high (there is a high possibility that they are the same person), they are treated as "same". About 0.6 seems to be just right. ** 2. ** ** Exclusion conditions are important ** -In order to maintain high quality feature vectors, do not update or register feature vectors when people overlap. ** 3. Combine complex conditions ** -Even if the cosine similarity is low, the identity is maintained by combining complex conditions (update the feature vector and current position of "people who are considered to be the same"). -Use the distance between the current and the center point of the nearest frame (Euclidean distance) and the degree of overlap of the Bounding BOX (IoU: Intersection over Union) as complex conditions. ** 4. ** ** Do not retain extra information -** Clear the Tracking information of people who are out of the predefined Tracking range in order to improve the comparison accuracy of feature vectors. ** 5. Control the number of people registering as new people ** ・ If the above conditions are not met, register as a new person.

In the GIF image posted at the very beginning, the threshold of cosine similarity in 1. is 0.6, the threshold of IoU in 2. is 0.2 (skip processing when 20% overlaps), and the cosine of 3. is slightly lower. The similarity threshold is 0.3.

This is an excerpt from this part. Even with the above implementation, there are many false positives (FN / FP), so it is still necessary to devise ways to maintain tracking.

`tracker.py`


    def person_reidentification(self, frame, persons, person_frames, boxes):

        if not person_frames:
            frame = self.draw_params(frame)
            frame = self.draw_couter_stats(frame)
            return frame

        feature_vecs = self.get_feature_vecs(person_frames[:reid_limit])

        # at the first loop
        if self.person_vecs is None:
            self.fisrt_detection(feature_vecs, boxes)

        similarities = cos_similarity(feature_vecs, self.person_vecs)
        similarities[np.isnan(similarities)] = 0
        person_ids = np.nanargmax(similarities, axis=1)

        for det_id, person_id in enumerate(person_ids):
            center = np.array(self.get_center_of_person(boxes[det_id]))

            # get the closest location of the person from track points
            track_points = np.array(
                [track_point[-1] for track_point in self.track_points]
            )
            closest_id, closest_distance = self.closest_distance(center, track_points)

            # get cosine similarity and check if the person frames are overlapped
            similarity = similarities[det_id][person_id]
            is_overlap = self.is_overlap(det_id, boxes)

            # 1. most likely the same person
            if similarity > sim_thld:
                self.update_tracking(
                    person_id, feature_vecs[det_id].reshape(1, 256), boxes[det_id]
                )
                frame = self.person_is_matched(
                    frame, person_id, boxes[det_id], similarity
                )
            # 2. nothing to do when person is out of counter area
            elif not is_overlap:
                # 3. apply minumum similarity threshold when a person is in the closest distance
                #      (Euclidean distance) with their lastet saved track points
                if similarity >= min_sim_thld and closest_id == person_id:
                    self.update_tracking(
                        person_id, feature_vecs[det_id].reshape(1, 256), boxes[det_id]
                    )
                # 4. nothing to do when person is out of counter area
                elif self.is_out_of_couter_area(center):
                    continue
                # 5. finally register a new person
                else:
                    self.register_person_vec(
                        feature_vecs[det_id], self.prev_feature_vecs, boxes[det_id]
                    )

        self.prev_feature_vecs = feature_vecs
        frame = self.draw_couter_stats(frame)
        frame = self.draw_params(frame)
        return frame

Summary

It's an impression when I implemented it, but there is one thing I noticed.

** If the expected result is not stable, it is not good at the level of changing the parameters a little, and it is better to review the implementation **.

Once you start changing the combination of parameters, you have to quit at some point or you will not have enough time. When it didn't work, I tampered with the similarity threshold in units of 0.05 and blindly retried many times. I'm no longer thinking. Even if it seems to work a little, it doesn't work at all when I try it with other videos.

I was saved because there was a model program this time, but I wasted a lot of time. But I think this is also the time I need.

[PYTHON] Track people using OpenVINO's Person-reidentification model