CASEE

Asaad Hakeem
Yaser Sheikh

A representational gap exists between low-level measurements (segmentation, object classification, tracking) and high-level understanding of video sequences. We present a novel representation of events in videos to bridge this gap, by extending the CASE representation of natural languages. The proposed representation has three significant contributions over existing frameworks. First, it is able to represent temporal structure and causality between sub-events. Second, it is capable of representing multi-agent and multi-threaded events. Last, it can detect events in an automated manner by subtree pattern matching. In order to validate our representation and automatically annotate the videos, we report experiments on several events occurring in two different domains, namely meeting video sequences, and for videos of railroad crossings.
 

Results for different sequences


Hand Raise (PETS video)


Stealing (USC video)


Railroad Monitoring

Sequence of frames for Railroad Monitoring videos


Sequence A23


Sequence A26


Sequence B03


Sequence B06


Sequence B13


Sequence of frames for Meeting videos


Argument Sequence


Object Passing Sequence


Presentation Sequence



Voting Sequence


Related Publications:

Asaad Hakeem and Mubarak Shah, Learning, Detection and Representation of Multi-Agent Events in Videos, Artificial Intelligence, 171 (2007) 586-605.

Asaad Hakeem, Yaser Sheikh, and Mubarak Shah, CASEE: A Hierarchical Event Representation for the Analysis of Videos, The Nineteenth National Conference on Artificial Intelligence (AAAI), 2004.