Real-Time Human Action Recognition Based on Depth Motion Maps

Chen Chen, Kui Liu, and Nasser Kehtarnavaz

Abstract

This paper presents a human action recognition method by using depth motion maps. Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view, the absolute difference between two consecutive projected maps is accumulated through an entire depth video sequence forming a depth motion map. An L2 regularized collaborative representation classifier with a distance-weighted Tikhonov matrix is then employed for action recognition. The developed method is shown to be computationally efficient allowing it to run in real-time. The recognition results applied to the Microsoft Research Action3D dataset indicate superior performance of our method over the existing methods.

Download

[PDF] [MATLAB CODE]

Examples of Depth Sequences from MSR-Action3D Dataset

Tennis Serve Forward Kick

If the videos cannot be viewed, download them from here: [forward kick] [tennis serve]

Depth Motion Map (DMM)

Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view (f: front view; s: side view; t: top view), the absolute difference between two consecutive projected maps is accumulated through an entire depth video sequence forming a depth motion map (DMMv, v={f,s,t}).
DMMs of Tennis Serve DMMs of Forward Kick

Experimental Results

The proposed method is evaluated on the MSR Action3D dataset. The MSR-Action3D dataset includes 20 actions performed by 10 subjects. Each subject performed each action 2 or 3 times. We compare our approach to some existing methods under the standard experimental settings.

1. Recognition Results

2. Real-time Operation

There are four main components in our method: projected depth map generation (three views) for each depth frame, DMMs feature generation, dimensionality reduction (PCA), and action recognition (L2-regularized collaborative representation classifier). Our real-time action recognition timeline is displayed in the following figure. The numbers in the figure indicate the main components in our method.

The average processing time of each component is listed in the table as follows. Our code is written in Matlab and the processing time reported is for a PC with 2.67GHz Intel Core i7 CPU with 4GB RAM.

References

[7] L. Xia, C. Chen and J-K. Aggarwal, "View invariant human action recognition using histograms of 3D joints," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.20-27, Providence, RI, 2012.

[8] X. Yang and Y. Tian, "EigenJoints-based action recognition using Naive-Bayes-Nearest-Neighbor," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14-19, Province, RI, 2012.

[9] W. Li, Z. Zhang, and Z. Liu, "Action recognition based on a bag of 3D points," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 9-14, San Francisco, CA, 2010.

[10] X. Yang, C. Zhang and Y. Tian, "Recognizing actions using depth motion maps-based histograms of oriented gradients," Proceedings of ACM International Conference on Multimedia, pp.1057-1060, Nara, Japan, 2012.

[12] A. Vieira, E. Nascimento, G. Oliveira, Z. Liu and M. Campos, "Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences," Iberoamerican Congress on Pattern Recognition, pp.252-259, Buenos Aires, Argentina, 2012.