Our solution to Mastek's Deep Blue Season 6, in which our team was one of the finalists.
As input, the demo application takes:
- paths to several video files specified with a command line argument
--videos
- indexes of web cameras specified with a command line argument
--cam_ids
To install required dependencies run
pip3 install -r requirements.txt
Our system reads the video/camera URI then makes an OpenCV object. Each object is then put into a different thread. Each thread passes through 4 modules.
These modules are as follows:-
- Detection module - It uses Tensorflow object detection API (Faster_RCNN_Inception_V2_Coco model) for person-like object detection. It then sends the bounding box of each person to the tracking module.
- Tracking module - It uses Intersection over Union (IOU) of bounding boxes from the last frame and current frame to track the movement of a person. We do this to give the same person the same tag and avoid overheads of trying to re-identify the person.
- Re-Identification module - It uses a Custom CNN trained on Cuhk03 dataset to compare previously stored images of all persons tagged with the current frame bounding box to see if the system has seen the person before. Also, we use the same model to compare people across cameras to avoid multi-counting.
- Mannequin detection module - It uses Intersection over Union (IOU) of bounding boxes from the last N frames to see if the person has ever moved. We also assign the bit to classify if the object is a human or a mannequin. If the object has moved we then assign the bit to say that the object is human and avoid calling the mannequin detection for that object in the future. If the object never moves, we assign the bit to say that the object is a mannequin and in the next frames, we call the mannequin section module to check once again if the object is a human or not. We have decided to use the current configuration after trying various techniques and models to confirm that the system should give accurate results and should not be very slow.