Abstract
Human action recognition is a broad field of research within computer vision, with applications spanning surveillance systems, video analysis, and various human-device interaction systems, such as human-computer interfaces. The development of human action recognition began in the early 1980s, with current research focusing primarily on learning and recognizing actions from video sequences. This study explores the use of deep learning techniques for recognizing human postures, leveraging two methods: the VGG-19 network model and the MobileNetV1 network model. We utilize the COCO dataset and real-world data for training and evaluation. The VGG-19 method demonstrates advantages in tracking and simulating hand gestures with clear images, though it struggles with lower quality images and cannot detect hands independently of the human body. Conversely, the PostNet model offers faster processing but with less extensive point detection compared to OpenPose. Future work will involve experimenting with more image and video formats, larger datasets, and applying these methods in practical scenarios such as crime surveillance or violence detection.