Visionnet a multicamera deep learning approach for target tracking| International Journal of Innovative Science and Research Technology

VisionNet: A Multicamera Deep Learning Approach for Target Tracking

Authors : Dr. N. Leelavathy; Sudipta Akash; Md. Sohail Ansari; Rayudu Navya Sri

Volume/Issue : Volume 11 - 2026, Issue 4 - April

Google Scholar : https://tinyurl.com/45b4pvpa

Scribd : https://tinyurl.com/bdpspywn

DOI : https://doi.org/10.38124/ijisrt/26apr358

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Multi-camera target tracking remains a fundamental challenge in intelligent video surveillance, particularly when maintaining consistent identity associations across spatially disjoint camera views with overlapping or non-overlapping fields of view. This paper introduces VisionNet, a unified deep learning framework that integrates state-of-the-art detection, single-camera tracking, and cross-camera re-identification into a cohesive and computationally efficient pipeline. VisionNet employs YOLOv8 for high-accuracy object detection, Deep Simple Online and Realtime Tracking (DeepSORT) for intracamera trajectory persistence, and Omni-Scale Network (OSNet) for discriminative cross-camera identity association. The framework leverages GPU-accelerated inference to sustain real-time throughput while preserving tracking fidelity under challenging surveillance conditions such as occlusion, lighting variation, and crowded scenes. A Flask-based monitoring dashboard provides operators with live visualization of tracked targets, inter-camera handoff events, and aggregate behavioral analytics. Extensive evaluation on benchmark datasets including MOT17, DukeMTMC-reID, and Market-1501 demonstrates that VisionNet achieves an IDF1 score of 76.4%, a MOTA of 73.9%, and a Rank-1 re-identification accuracy of 96.2%, outperforming competing baselines on identity-switch minimization. The modular architecture facilitates seamless integration with existing closed-circuit television (CCTV) infrastructure and supports diverse application domains including intelligent security systems, crowd flow analytics, retail behavior monitoring, and multi-athlete sports event tracking.

Keywords : Multi-Camera Tracking, Person Re-Identification, YOLOv8, DeepSORT, OSNet, Video Surveillance, Deep Learning.

References :

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” GitHub repository, 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
C. Luo, X. Zhao, W. Nie, and Y. Cui, “Graph Neural Network-Based Multi-Camera Multi-Target Tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 5, pp. 2302–2315, May 2023.
A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Phoenix, AZ, USA, 2016, pp. 3464–3468.
N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Beijing, China, 2017, pp. 3645–3649.
K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, Korea, 2019, pp. 3702–3712.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the fairness of detection and re-identification in multiple object tracking,” Int. J. Comput. Vis., vol. 129, no. 11, pp. 3069–3087, Nov. 2021.
Z. Zhang, W. Cheng, X. Hou, L. Qi, Y. Lu, J. Shi, and C. Change Loy, “ByteTrack: Multi-object tracking by associating every detection box,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Tel Aviv, Israel, 2022, pp. 1–18.
A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,” arXiv: 1603.00831, 2016.
E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proc. Eur. Conf. Comput. Vis. Workshops (ECCVW), Amsterdam, The Netherlands, 2016, pp. 17–35.
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Munich, Germany, 2018, pp. 501–518.
G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in Proc. ACM Int. Conf. Multimedia (ACM MM), Seoul, Korea, 2018, pp. 274–282.
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, 2015, pp. 1116–1124.
W. Li, R. Zhao, T. Xiao, and X. Wang, “DeepReID: Deep filter pairing neural network for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Columbus, OH, USA, 2014, pp. 152–159.

Multi-camera target tracking remains a fundamental challenge in intelligent video surveillance, particularly when maintaining consistent identity associations across spatially disjoint camera views with overlapping or non-overlapping fields of view. This paper introduces VisionNet, a unified deep learning framework that integrates state-of-the-art detection, single-camera tracking, and cross-camera re-identification into a cohesive and computationally efficient pipeline. VisionNet employs YOLOv8 for high-accuracy object detection, Deep Simple Online and Realtime Tracking (DeepSORT) for intracamera trajectory persistence, and Omni-Scale Network (OSNet) for discriminative cross-camera identity association. The framework leverages GPU-accelerated inference to sustain real-time throughput while preserving tracking fidelity under challenging surveillance conditions such as occlusion, lighting variation, and crowded scenes. A Flask-based monitoring dashboard provides operators with live visualization of tracked targets, inter-camera handoff events, and aggregate behavioral analytics. Extensive evaluation on benchmark datasets including MOT17, DukeMTMC-reID, and Market-1501 demonstrates that VisionNet achieves an IDF1 score of 76.4%, a MOTA of 73.9%, and a Rank-1 re-identification accuracy of 96.2%, outperforming competing baselines on identity-switch minimization. The modular architecture facilitates seamless integration with existing closed-circuit television (CCTV) infrastructure and supports diverse application domains including intelligent security systems, crowd flow analytics, retail behavior monitoring, and multi-athlete sports event tracking.

Keywords : Multi-Camera Tracking, Person Re-Identification, YOLOv8, DeepSORT, OSNet, Video Surveillance, Deep Learning.

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.