Smart Video Monitoring: Advanced Deep Learning for Activity and Object Recognition


Authors : Shashikumar D R; Tejashwini N; K N Pushpalatha; Anurag Kumar; Om Chavan; Atharva Mishra

Volume/Issue : Volume 10 - 2025, Issue 3 - March


Google Scholar : https://tinyurl.com/485u88n3

Scribd : https://tinyurl.com/bdzyj3v3

DOI : https://doi.org/10.38124/ijisrt/25mar088

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : This study explores the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks for the real-time recognition of human activities in video data. By harnessing the advantages of these two approaches, the system achieves high accuracy in detecting complex human actions. Specifically, CNNs address the spatial aspects of the task, while LSTMs handle the temporal sequences. A notable feature of the system is its categorization module, which enables users to select an action and identify similar actions, thereby enhancing productivity and usability. Existing models often face challenges related to real-time inter- action capabilities and resilience to environmental disturbances. This study tackles these shortcomings by refining the CNN-LSTM framework to support real-time functionality and incorporating preprocessing techniques, such as frame extraction and normal- ization, to improve input data quality. The system’s effectiveness is measured using indicators like accuracy, recall, and latency, demonstrating its advantages over traditional rule-based and basic deep learning approaches. The early findings are optimistic, demonstrating significant improvements in performance. Nevertheless, challenges remain, particularly in tracking per- formance under occlusion or in cluttered environments. Future research should explore the integration of multi-modal data and advanced architectures, such as spatio- temporal graph con- volutional networks (STGCN), to further enhance recognition accuracy and system robustness. In conclusion, the proposed CNN-LSTM hybrid architecture for activity recognition demonstrates potential for applications in video surveillance and beyond, including fields like healthcare and sports analytics. The system offers improved automated monitoring capabilities through enhanced accuracy, scalable human action detection, and user- friendly design.

References :

  1. H. Park, Y. Chung and J. -H. Kim, ”Deep Neural Networks-based Classi- fication Methodologies of Speech, Audio and Music, and its Integration for Audio Metadata Tagging,” in Journal of Web Engineering, vol. 22, no. 1, pp. 1-26, January 2023, doi: 10.13052/jwe1540-9589.2211.
  2. W. Huang, Y. Liu, S. Zhu, S. Wang and Y. Zhang, ”TSCNN: A 3D Convolutional Activity Recognition Network Based on RFID RSSI,” 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/IJCNN48605.2020.9207590.
  3. A. M. F and S. Singh, ”Computer Vision-based Survey on Human Activity Recognition System, Challenges and Applications,” 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 2021, pp. 110-114, doi: 10.1109/IC- SPC51351.2021.9451736.
  4. S. Aarthi and S. Juliet, ”A Comprehensive Study on Human Activity Recognition,” 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 2021, pp. 59-63, doi: 10.1109/ICSPC51351.2021.9451759.
  5. C. Zhao, L. Wang, F. Xiong, S. Chen, J. Su and H. Xu, ”RFID-Based Hu- man Action Recognition Through Spatiotemporal Graph Convolutional Neural Network,” in IEEE Internet of Things Journal, vol. 10, no. 22, pp. 19898-19912, 15 Nov.15, 2023, doi: 10.1109/JIOT.2023.3282680.

This study explores the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks for the real-time recognition of human activities in video data. By harnessing the advantages of these two approaches, the system achieves high accuracy in detecting complex human actions. Specifically, CNNs address the spatial aspects of the task, while LSTMs handle the temporal sequences. A notable feature of the system is its categorization module, which enables users to select an action and identify similar actions, thereby enhancing productivity and usability. Existing models often face challenges related to real-time inter- action capabilities and resilience to environmental disturbances. This study tackles these shortcomings by refining the CNN-LSTM framework to support real-time functionality and incorporating preprocessing techniques, such as frame extraction and normal- ization, to improve input data quality. The system’s effectiveness is measured using indicators like accuracy, recall, and latency, demonstrating its advantages over traditional rule-based and basic deep learning approaches. The early findings are optimistic, demonstrating significant improvements in performance. Nevertheless, challenges remain, particularly in tracking per- formance under occlusion or in cluttered environments. Future research should explore the integration of multi-modal data and advanced architectures, such as spatio- temporal graph con- volutional networks (STGCN), to further enhance recognition accuracy and system robustness. In conclusion, the proposed CNN-LSTM hybrid architecture for activity recognition demonstrates potential for applications in video surveillance and beyond, including fields like healthcare and sports analytics. The system offers improved automated monitoring capabilities through enhanced accuracy, scalable human action detection, and user- friendly design.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe