A Helping Hand: Gestures Recognition using Android Mediapipe

Shivam Roy; Anuj Singh

Shivam Roy Department of Computer Science & Engineering, Sharda School of Engineering and Technology, Sharda University, Greater Noida, India
Anuj Singh Department of Computer Science & Engineering, Sharda School of Engineering and Technology, Sharda University, Greater Noida, India

Keywords: Gesture Recognition, Facial Expression Recognition, Bayesian Classifier

Abstract

Abstract. This paper presents a work related to multimodal automatic gestures
recognition during human interaction. There was a dedicated database prepared;
participants completed tasks based on a command-based structure to realize eight
different emotional states. There were three primary feature extraction methods
adopted in the system: facial expression recognition, gesture analysis through
MediaPipe, and acoustic analysis. MediaPipe, which runs on machine learning
for gesture tracking and detection, was crucial for hand movement analysis. It
used algorithms like CNNs for key hand landmarks’ detection in making the
gesture recognition process more accurate. Then, it applied a Bayesian classifier
for automatically classifying the emotions based on data. Three types of data
were tested: unimodal (single input), bimodal (two inputs), and multimodal (all
three inputs together). This was either pre or post-classification. The outcome
of these experiments was that multimodal fusion improved recognition rates by
more than 10% compared to the best unimodal system. Of these combinations,
‘gesture-acoustic’ proved most effective. The use of all three types of data
resulted in further improvements above the best bimodal combination.

References

1. 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015) A New
Data Glove Approach for Malaysian Sign Language Detection.
2. Bridging the Communication Gap: Artificial Agents Learning Sign Language through
Imitation
3. Federico Tavella 1,2, Aphrodite Galata 2 and Angelo Cangelosi 1,2 (2024)
4. Ahmad Zaki Shukor, Muhammad Fahmi Miskon, Muhammad Herman Jamaluddin, Fariz bin
5. Ali@Ibrahim, Mohd Fareed Asyraf *, Mohd Bazli bin Bahar
6. A survey on recent advances in Sign Language Production Razieh Rastgoo a, ∗, Kourosh
Kiani a, Sergio Escalera b, Vassilis Athitsos c, Mohammad Sabokrou d
7. Machine Learning with Applications 14 (2023) 100504 A survey on sign language literature
Marie Alaghband a, b, ∗, Hamid Reza Maghroor b, Ivan Garibay b
8. Karg, M., Kuhnlenz, K., & Buss, M. (2010). Recognition of affect based on human body
movement. Pattern Recognition, 43(3), 1052-1064.
9. Neural Sign Language Translation Necati Cihan Camgoz 1 , Simon Hadfield 1 , Oscar
Koller 2 , Hermann Ney 2 , Richard Bowden 11 University of Surrey, {n.camgoz, s.hadfield,
r.bowden}@surrey.ac.uk2 RWTH Aachen University, {koller, ney}@cs.rwth-aachen.de
10. Expert Systems with Applications 103 (2018) 159–183 Gesture recognition: A review
focusing on sign language in a mobile contextDavi Hirafuji Neiva ∗, Cleber Zanchettin
11. Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial
expression analysis. IEEE International Conference on Multimedia and Expo.
12. Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). EmotioNet: An Accurate,
Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the
Wild. CVPR
13. Sebe, N., Cohen, I., Gevers, T., & Huang, T. S. (2006). Multimodal approaches for emotion
recognition: A survey. Proceedings of SPIE-The International Society for Optical Engineering.
14. Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal humancomputer interaction. Proceedings of the IEEE, 91(9), 1370-1390.
15. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition
methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 31(1), 39-58.
16. Learning a deep network with spherical part model for 3D hand pose estimation TzuYang Chen a , Pai-Wen Ting a , Min-Yu Wu a , Li-Chen Fu a,b,∗ Pattern Recognition 80
(2018) 1–20
17. Setia, S., Anjli, K., Bisht, U., Jyoti, Raj, D. Event Management System Using Spatial and
Event Attribute Information. SN COMPUT. SCI. 6, 290 (2025). https://doi.org/10.1007/
s42979-025-03781-0.
18. Naitik, D. Raj, D. K. Rajan, A. K. Gupta, A. K. Agrawal and K. R. Krishna, “Enhancing
Toxic Comment Detection with BiLSTM-Based Deep Learning Model,” 2024 International
Conference on Information Science and Communications Technologies (ICISCT), Seoul,
Korea, Republic of, 2024, pp. 206-211, doi: 10.1109/ICISCT64202.2024.10956568.
19. S. Singh, P. Prakash, G. Baghel, A. Singh, D. Raj and A. K. Agrawal, “Banana Crop Health:
A Deep Learning-Based Model for Disease Detection and Classification,” 2024 27th
International Symposium on Wireless Personal Multimedia Communications (WPMC),
Greater Noida, India, 2024, pp. 1-6, doi: 10.1109/WPMC63271.2024.10863138.
20. Adhikari, M.S., Gupta, R., Raj, D., Astya, R., Ather, D., Agrawal, A. (2025). Prevention of
Attacks on Spanning Tree Protocol. In: Dutta, S., Bhattacharya, A., Shahnaz, C., Chakrabarti,
S. (eds) Cyber Intelligence and Information Retrieval. CIIR 2023. Lecture Notes in Networks
and Systems, vol 1139. Springer, Singapore. https://doi.org/10.1007/978-981-97-7603-0_24

A Helping Hand: Gestures Recognition using Android Mediapipe

Abstract

References

SJIF 2013 3.9

Publisher: Ather Educational Society