nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2025, 03, v.57 65-71
Still Image Action Recognition Method Combining ResNet and CBAM
Email: wanfangjie@zzu.edu.cn;
DOI: 10.13705/j.issn.1671-6841.2023171
Received:   2023-07-06
Received Year:   2023
Revised:   2024-04-27
Accepted:   2025-06-17
Accepted Year:   2025
Review Duration(Year):   2
Published:   2024-06-24
Publication Date:   2024-06-24
Online:   2024-06-24
Mobile reading
Abstract:

To address the problem of poor recognition performance caused by the lack of large-scale datasets and the inability to utilize spatiotemporal features, a model that combined residual neural network(ResNet)and convolutional block attention module(CBAM)was proposed for still image action recognition. Specific data augmentation techniques were employed to extend the dataset. Transfer learning was applied to initialize the model, followed by fine-tuning to enhance feature representation of still image action recognition. The CBAM was embedded into the first convolutional layer of ResNet to adjust the model′s attention. The Grad-CAM method was utilized to extract and visualize the regions of interest in image which provided an explanation for the precision improvement. On the PPMI dataset, the proposed model achieved the average precision for instrument-playing, instrument-holding, and overall categories of 88.30%, 81.94% and 77.93%, respectively, which verified the effectiveness of the method.

References

[1] GUO G D,LAI A.A survey on still image based human action recognition[J].Pattern recognition,2014,47(10):3343-3361.

[2] GIRISH D,SINGH V,RALESCU A.Understanding action recognition in still images[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Piscataway:IEEE Press,2020:1523-1529.

[3] YAO B P,LI F F.Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses[J].IEEE transactions on pattern analysis and machine intelligence,2012,34(9):1691-1703.

[4] 杨红菊,冯进丽,郭倩.基于多核学习的静态图像人体行为识别方法[J].数据采集与处理,2016,31(5):958-964.YANG H J,FENG J L,GUO Q.Action recognition in still image based on multiple kernel learning[J].Journal of data acquisition and processing,2016,31(5):958-964.

[5] 王恩德,刘巧英,李勇.基于LLC与GIST特征的静态人体行为分类[J].计算机工程,2018,44(8):268-272,278.WANG E D,LIU Q Y,LI Y.Static human behavior classification based on LLC and GIST features[J].Computer engineering,2018,44(8):268-272,278.

[6] 钱文祥,衣杨.视频识别深度学习网络综述[J].计算机科学,2022,49(S2):341-350.QIAN W X,YI Y.Survey of deep learning networks for video recognition[J].Computer science,2022,49(S2):341-350.

[7] PRATT S,YATSKAR M,WEIHS L,et al.Grounded situation recognition[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:314-332.

[8] LAVINIA Y,VO H,VERMA A.New colour fusion deep learning model for large-scale action recognition[J].International journal of computational vision and robotics,2020,10(1):41.

[9] LI Z Q,GE Y X,FENG J Y,et al.Deep selective feature learning for action recognition[C]//IEEE International Conference on Multimedia and Expo.Piscataway:IEEE Press,2020:1-6.

[10] 魏丽冉,岳峻,朱华,等.基于深度神经网络的人体动作识别方法[J].济南大学学报(自然科学版),2019,33(3):215-223,228.WEI L R,YUE J,ZHU H,et al.Human action recognition method based on deep neural network[J].Journal of university of Jinan (science and technology),2019,33(3):215-223,228.

[11] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:770-778.

[12] WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block attention module[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2018:3-19.

[13] YAO B P,LI F F.Grouplet:a structured image representation for recognizing human and object interactions[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2010:9-16.

[14] CHAKRABORTY S,MONDAL R,SINGH P K,et al.Transfer learning with fine tuning for human action recognition from still images[J].Multimedia tools and applications,2021,80(13):20547-20578.

[15] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL].(2021-04-15)[2023-03-06].https://arxiv.org/pdf/1602.07360.pdf.

[16] HOWARD A,SANDLER M,CHEN B,et al.Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision.Piscataway:IEEE Press,2020:1314-1324.

[17] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:visual explanations from deep networks via gradient-based localization[J].International journal of computer vision,2020,128:336-359.

Basic Information:

DOI:10.13705/j.issn.1671-6841.2023171

China Classification Code:TP391.41

Citation Information:

[1]GAO Han,WAN Fangjie,MA Mingxu.Still Image Action Recognition Method Combining ResNet and CBAM[J].Journal of Zhengzhou University(Natural Science Edition),2025,57(03):65-71.DOI:10.13705/j.issn.1671-6841.2023171.

Fund Information:

河南省重大专项(221100210100)

Received:  

2023-07-06

Received Year:  

2023

Revised:  

2024-04-27

Accepted:  

2025-06-17

Accepted Year:  

2025

Review Duration(Year):  

2

Published:  

2024-06-24

Publication Date:  

2024-06-24

Online:  

2024-06-24

quote

GB/T 7714-2015
MLA
APA
Search Advanced Search