Visual recognition of human actions is currently an active area of research. With the recent development of time-of-flight cameras and the upcoming release of Microsoft’s Kinect, we expect that depth information will be able to assist the task of action recognition tremendously in the near future. However, most existing human action recognition techniques do not exploit this information. Some questions remain unanswered: Is depth data alone enough for action recognition? What can we expect from upcoming technologies that use depth? We seek to answer these questions.
Our method is based on the bag-of-words model commonly used in object recognition. In general simple 3D extensions of the feature detectors and descriptors used for object recognition, such as SIFT and SURF, are unsuitable for analyzing sequences in the spatiotemporal domain. We are investigating shape-context inspired feature descriptors,
as well as the responses of Gabor filters convolved with the input sequences, to discriminate between actions. A bank of 1-vs-1 RBF support vector machines are used as classifiers, and a simple voting technique determines the final labeling.
More information about the project and weekly presentation slides can be found on the materials page.
This work was supported by the National Science Foundation through the REU program. I would like to thank my advisor, Dr. Marshall Tappen, for his enormous contributions to this work and unwavering support. A special thanks also goes to Brian Millikin for his assistance. Finally, I would like to thank the rest of the REU faculty and friends for making this summer a fantastic experience.