Unsupervised Learning of Human Action Categories


Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei




 To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, object-level video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects.

We present a  novel unsupervised learning method for learning human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm learns the probability distributions of the spatial-temporal words and intermediate topics corresponding to human action categories automatically using a probabilistic Latent Semantic Analysis (pLSA) model. The learned model is then used for human action categorization and localization in a novel video, by maximizing the posterior of action category (topic) distributions. The contributions of this work are as follows:

  • Unsupervised learning of actions using 'video words' representation: We deploy a pLSA model with `bag of video words' representation for video analysis;

  • Multiple action localization and categorization: Our approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence.

Future work includes content-based video retrieval and semantic video synthesis.




1. Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Motion Categories, in Video Proceeding, International Conference on Computer Vision and Pattern Recognition (VPCVPR), New York, 2006

Full Text: PDF (One-page, 209K)

2. Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Motion Categories Using Spatial-Temporal Words, BMVC, 2006

Full Text: PDF (1.9MB)





Video: demo (.avi, DiVx 6.2 compressed, ~60MB)

If you can not play the video, please download DivX 6.2 codec



human actions


Updated: May.16, 2006