Abstract
Applications of Deep learning based methods are enormously growing in order to help the blind to see the world and/or enable the deaf to hear and speak. Re-Identification of a person among a set of cameras, is one of the latest challenges, in Computer Vision. Person Re-Identification deals with matching images of the same person over multiple non-overlapping camera views. Commonly, the task of Re-Id is broken down into three sub-modules, which are detection, tracking, and matching. Most of the techniques use manually annotated bounding boxes and only focus on matching between probes and cropped candidate images. This is not desirable in a real-time environment where the localization of object boundaries is not available. The target person needs to be identified from the complete image which may contain many distractors. To address the issue we investigated how the localization and matching of the target person can be done without using any prior annotation of bounding boxes. Our proposed method is based on an end to end deep learning technique, which not only targets matching but localization of objects as well. It handles detection and Re-Identification together. The research provides an end-to-end implementation of person tracking across multiple cameras in Surveillance context. The model is tested under diverse situations and resulted in a higher retrieval accuracy. The whole network is jointly optimized, using CIFM loss and fine-tuned to get better accuracy. The proposed approach outperforms state of the art methods on the PRW datasets, which demonstrates the effectiveness and generalization ability of our proposed approach. © 2018 IEEE.