Authors :
Shashikumar D R; Tejashwini N; K N Pushpalatha; Anurag Kumar; Om Chavan; Atharva Mishra
Volume/Issue :
Volume 10 - 2025, Issue 3 - March
Google Scholar :
https://tinyurl.com/485u88n3
Scribd :
https://tinyurl.com/bdzyj3v3
DOI :
https://doi.org/10.38124/ijisrt/25mar088
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This study explores the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory
(LSTM) networks for the real-time recognition of human activities in video data. By harnessing the advantages of these
two approaches, the system achieves high accuracy in detecting complex human actions. Specifically, CNNs address the
spatial aspects of the task, while LSTMs handle the temporal sequences. A notable feature of the system is its
categorization module, which enables users to select an action and identify similar actions, thereby enhancing productivity
and usability.
Existing models often face challenges related to real-time inter- action capabilities and resilience to environmental
disturbances. This study tackles these shortcomings by refining the CNN-LSTM framework to support real-time
functionality and incorporating preprocessing techniques, such as frame extraction and normal- ization, to improve input
data quality. The system’s effectiveness is measured using indicators like accuracy, recall, and latency, demonstrating its
advantages over traditional rule-based and basic deep learning approaches. The early findings are optimistic,
demonstrating significant improvements in performance.
Nevertheless, challenges remain, particularly in tracking per- formance under occlusion or in cluttered environments.
Future research should explore the integration of multi-modal data and advanced architectures, such as spatio-
temporal graph con- volutional networks (STGCN), to further enhance recognition accuracy and system robustness.
In conclusion, the proposed CNN-LSTM hybrid architecture for activity recognition demonstrates potential for
applications in video surveillance and beyond, including fields like healthcare and sports analytics. The system offers
improved automated monitoring capabilities through enhanced accuracy, scalable human action detection, and user-
friendly design.
References :
- H. Park, Y. Chung and J. -H. Kim, ”Deep Neural Networks-based Classi- fication Methodologies of Speech, Audio and Music, and its Integration for Audio Metadata Tagging,” in Journal of Web Engineering, vol. 22, no. 1, pp. 1-26, January 2023, doi: 10.13052/jwe1540-9589.2211.
- W. Huang, Y. Liu, S. Zhu, S. Wang and Y. Zhang, ”TSCNN: A 3D Convolutional Activity Recognition Network Based on RFID RSSI,” 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/IJCNN48605.2020.9207590.
- A. M. F and S. Singh, ”Computer Vision-based Survey on Human Activity Recognition System, Challenges and Applications,” 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 2021, pp. 110-114, doi: 10.1109/IC- SPC51351.2021.9451736.
- S. Aarthi and S. Juliet, ”A Comprehensive Study on Human Activity Recognition,” 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 2021, pp. 59-63, doi: 10.1109/ICSPC51351.2021.9451759.
- C. Zhao, L. Wang, F. Xiong, S. Chen, J. Su and H. Xu, ”RFID-Based Hu- man Action Recognition Through Spatiotemporal Graph Convolutional Neural Network,” in IEEE Internet of Things Journal, vol. 10, no. 22, pp. 19898-19912, 15 Nov.15, 2023, doi: 10.1109/JIOT.2023.3282680.
This study explores the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory
(LSTM) networks for the real-time recognition of human activities in video data. By harnessing the advantages of these
two approaches, the system achieves high accuracy in detecting complex human actions. Specifically, CNNs address the
spatial aspects of the task, while LSTMs handle the temporal sequences. A notable feature of the system is its
categorization module, which enables users to select an action and identify similar actions, thereby enhancing productivity
and usability.
Existing models often face challenges related to real-time inter- action capabilities and resilience to environmental
disturbances. This study tackles these shortcomings by refining the CNN-LSTM framework to support real-time
functionality and incorporating preprocessing techniques, such as frame extraction and normal- ization, to improve input
data quality. The system’s effectiveness is measured using indicators like accuracy, recall, and latency, demonstrating its
advantages over traditional rule-based and basic deep learning approaches. The early findings are optimistic,
demonstrating significant improvements in performance.
Nevertheless, challenges remain, particularly in tracking per- formance under occlusion or in cluttered environments.
Future research should explore the integration of multi-modal data and advanced architectures, such as spatio-
temporal graph con- volutional networks (STGCN), to further enhance recognition accuracy and system robustness.
In conclusion, the proposed CNN-LSTM hybrid architecture for activity recognition demonstrates potential for
applications in video surveillance and beyond, including fields like healthcare and sports analytics. The system offers
improved automated monitoring capabilities through enhanced accuracy, scalable human action detection, and user-
friendly design.