Kinect-based hand gesture recognition using trajectory information, hand motion dynamics and neural networks

Document Type


Date of Original Version



Hand gestures are spatio-temporal patterns which can be characterized by collections of spatio-temporal features. Recognition of hand gestures is to find the re-occurrences of such spatio-temporal patterns through pattern matching. However, dynamic hand gestures have many obstacles for accurate recognition, including poor lighting conditions, camera’s inability to capture dynamic gesture in focus, occlusion due to finger movement, color variations due to lighting conditions. The Microsoft Kinect device provides an effective way to solve the above issues and also provides the skeleton for more convenient hand localization and tracking. The aim of this study is to develop a new trajectory-based method for hand gesture recognition using Kinect. In the first step, trajectory-based hand gesture features including spatial position and direction of fingertips, are derived from Kinect. The properties associated with the hand motion dynamics are preserved in these features. In the second step, radial basis function (RBF) neural networks are employed to model and approximate the hand motion dynamics derived from different hand gestures which represent Arabic numbers (0–9) and English alphabets (A–Z). The trained patterns of the approximated hand motion dynamics is stored in constant RBF networks. In the last step, a bank of dynamical estimators is constructed for all the training patterns, in which the constant RBF networks are embedded in. By comparing the set of estimators with a test gesture pattern, a set of recognition errors are generated, in which the average L 1 norms of the errors are taken as the recognition measure based on the smallest error principle. Finally, experiments are carried out to assess the performance of the proposed method compared with other state-of-the-art approaches. By using the twofold and tenfold cross-validation styles, the correct recognition rates for Arabic numbers (0–9) and English alphabets (A–Z) are reported to be 95.83 % , 97.25 % , and 91.35 % , 92.63 % , respectively.

Publication Title

Artificial Intelligence Review