<< Back

COCOA: Alignment, Object Detection, Object Tracking and Indexing of Aerial Videos
Saad Ali, Yaser Sheikh, Min Hu, & Paul Scovanner

 

COCOA, is a modular system capable of performing motion compensation, moving object detection, object tracking and indexing of videos taken from a camera mounted on a moving aerial platform (e.g. UAVs). In order to index a video COCOA processes it through a number of stages. At first stage, motion of the aerial platform is compensated by employing one of the several frame to frame alignment methods which are available in COCOA. Second stage performs moving object detection by employing a hybrid approach which involves frame differencing, background modeling and object segmentation. In third stage, foreground regions are tracked as long as they remain within the field of view of the camera. Finally, tracks are generated and analyzed with respect to the global mosaic reference frame. Interesting events are marked out in these trajectories and subsequently used to index the video. In addition to these capabilities COCOA also provides a search feature which can be used to retrieve previously indexed videos from the database. COCOA is customizable to different sensor resolutions and is capable of tracking targets as small as 200 pixels. It works seamlessly for both visible and thermal imaging modes. The system is implemented in Matlab and performs video processing in a batch mode.

            
                              The above figure shows the graphical user interface of COCOA.


Results from DARPA VIVID-1 and VIVID-2 Corpus
Sequence 1
 
Motion Compensation Object Detection Object Tracking Global Tracks

Sequence 2
 
Motion Compensation Object Detection Object Tracking Global Tracks

Sequence 3
 
Motion Compensation Object Detection Object Tracking Global Tracks

Following sections will describe different modules that are integrated into COCOA.

Ego Motion Compensation

Motion compensation module of COCOA accounts for the continuous motion of the camera mounted on the aerial platform. This modules contributes to the working of the system in two ways. First, it helps in detecting independently moving ground targets as after compensation in any two neighboring frames the intensity of only those pixels will be changing that belong to the moving object. Second, by registering the whole video with respect to one global reference we are able to get a meaningful representation of the object trajectories which can be used to describe the entire video by detecting interesting events from them.

In COCOA we integrated three different ways to accomplish the video registration task:

  1. Frame to Frame Alignment using Telemetry Information
  2. Feature-Based Registration
  3. Gradient-Based Registration

            
 The above figure shows part of the the graphical user interface that is used for performing motion compensation.

More Alignment Results

       
Result 1                          Result 2                          Result 3
Moving Object Detection

To detect motion of independently moving objects, such as cars, trucks, people or motorbikes, we incorporated two methods into COCOA which are Accumulative Frame Differencing and Background Modeling. In Accumulative Frame Differencing approach, differencing is performed with respect to a sliding reference coordinate in order to accommodate sequences of hundreds of frames. After computing the difference of each frame with respect to its p neighboring frames , the evidence is accumulated. The log-evidence at each frame is histogrammed. The large peak in the histogram corresponds to the background pixels and the smaller peak corresponds to the foreground pixels. In Background Modeling, the background subtraction is performed in a hierarchical manner consisting of two stages i.e., Pixel level and Frame level processing. At pixel level each a mixture of Gaussian distributions is used to adaptively model the pixel in RGB color space. Frame level processing is used to handle quick illumination changes. The background subtraction is performed in the reference coordinate of aligned images. Sometime due to rapid motion of the objects the foreground regions detected by the above mentioned methods contain some part of the background in them. This can degrade the performance of the tracking module in later stages as the appearance templates will be affected by the contribution from the background pixels. In order to overcome this problem we integrated a level-set based segmentation approach in COCOA that will segment all the pixel from the foreground regions that are coming from the moving objects. This approach works by evolving a contour by using the detected foreground regions as the initialization. This method uses both texture and color features for clustering the pixels into the background and foreground region.

            
   The above figure shows part of the the graphical user interface that is used for performing object detection.

More Object Detection Results


Result 1


Result 2
More Object Segmentation Results

           
Result 1                    Result 2                    Result 3                    Result 4
Object Tracking

Goal of the tracking module in COCOA is to track the targets detected by the motion detection stage, as long as they remain visible in the field of view of the camera. This is critical for obtaining tracks that reflect the motion characteristics of the tracked object over longer durations of time. Two tracking methods are incorporated into COCOA for achieving this goal namely Kernel-based object tracker and Blob tracker. In Kernel-based object tracker representation based on the color distribution of the target objects in RGB space are used instead of raw image pixels. Each object is represented separately by its own color distribution. Tracking is performed in global coordinates. Once a new blob is detected a new object is initialized and tracked across the sequence till it exits. To ensure adaptivity of the mean shift tracker, the template is updated after small intervals. In the second method a blob tracking approach is used to perform multitarget tracking. Regions of interest, or blobs, in successive frames are given by the motion detection module. Each blob represented by its own appearance and shape models. Temporal relationship between the blobs are established by using a cost function that takes into account appearance and shape similarity. Association between the blobs are established if the score is above a threshold.

            
   The above figure shows part of the the graphical user interface that is used for performing object tracking.

More Tracking Results

       
Result 1                           Result 2
Event Detection and Indexing

Event detection in COCOA is performed by using the trajectories generated by the object tracking module. First smoothing is performed on the trajectories and any outliers generated by the detection and tracking are removed. Then event detection is performed by detecting Primitive Motion and Composite Motion verbs. Primitive motion verbs are the basic action units whereas grouping of these action units define a composite event. Move forward, Move Left and Move Right are some examples of primitive motion verbs that we are dealing with in COCOA. Composite motion verbs describe the event as a whole e.g car making a S-turn is a composite event. . Detection of primitive motion verbs proceeds by first segmenting the trajectories using divide-and-conquer method involving following steps: a) Find the ”sharpest” segment in the input track. b) Repeat this process to the left side and the right side of the ”sharpest” segment. Model-based approach is adopted to detect composite motion verbs in a given trajectory. For each motion verb, we create several typical models using simulated data. Then, the sequence of primitive motion verbs is compared with these models to find the best match. Matching is performed by a revised version of Edit Distance.

Ones the events are detected from the trajectories next step is to index the videos on the basis of these events. Indexing approach in COCOA uses two level index.1) Primary Indexing (Event Indexing): Searches for those videos in the database that have at least one of the primitive motion verbs of the query trajectory (e.g, S-turn, U-turn) which is exhibited by the same type of object (e.g,person,vehicle) as those of query trajectory. 2) Secondary Indexing (Trajectory Indexing): Searches similar trajectories in the subset returned by previous step for composite event. Results are than sorted according to the Edit Distance.

 

Event Detection Results

           
       
Related Publications
  1. Saad Ali and Mubarak Shah , COCOA - Tracking in Aerial Imagery, SPIE Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications, Orlando, 2006.
  2. Saad Ali and Mubarak Shah , COCOA - Tracking in Aerial Imagery, Demo presentation at ICCV 2005 in Beijing, China.
  3. Alper Yilmaz, Xin Li, and Mubarak Shah, Contour-Based Object Tracking with Occlusion Handling in Video Acquired Using Mobile Cameras, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No.11, pp. 1531-1536, 2004.
  4. Omar Javed , Khurram Shafique and Mubarak Shah, "A Hierarchical Approach to Robust Background Subtraction using Color and Gradient Information", IEEE Workshop on Motion and Video Computing, Orlando, Dec 5-6 2002.
  5. A. Yilmaz, K. Shafique, T. Olson, N. Lobo and M. Shah "Target Tracking in FLIR Imagery Using Mean-Shift and Global Motion Compensation," proceedings of IEEE Workshop on Computer Vision Beyond Visible Spectrum (CVBVS), Hawaii, 2001.
Keywords: UAV Videos, Motion Compensation, Moving Target Detection, Target Tracking, Event Detection, Airborne Surveillance