Alper Yilmaz, Omar Javed and Mubarak Shah, “Object Tracking: A Survey", ACM  Computing Surveys, December 2006.


The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.



Orkun Alatas, Omar Javed and Mubarak Shah, “Video Compression Using Spatiotemporal Regularity Flow", IEEE Transactions on Image Processing, Vol. 15, No. 12, pp. 3812-3823, December 2006.


We propose a new framework in wavelet video coding to improve the compression rate by exploiting the spatiotemporal regularity of the data. A sequence of images creates a spatiotemporal volume. This volume is said to be regular along the directions in which the pixels vary the least, hence the entropy is the lowest. The wavelet decomposition of regularized data results in a fewer number of significant coefficients, thus yielding a higher compression rate. The directions of regularity of an image sequence depend on both its motion content and spatial structure. We propose the representation of these directions by a 3-D vector field, which we refer to as the spatiotemporal regularity flow (SPREF). SPREF uses splines to approximate the directions of regularity. The compactness of the spline representation results in a low storage overhead for SPREF, which is a desired property in compression applications. Once SPREF directions are known, they can be converted into actual paths along which the data is regular. Directional decomposition of the data along these paths can be further improved by using a special class of wavelet basis called the 3-D orthonormal bandelet basis. SPREF -based video compression not only removes the temporal redundancy, but it also compensates for the spatial redundancy. Our experiments on several standard video sequences demonstrate that the proposed method results in higher compression rates as compared to the standard wavelet based compression.



Alper Yilmaz and Mubarak Shah, “Matching actions in presence of camera motion", Computer Vision and Image Understanding Vol. 104 (2006), pp. 221231.


When the camera viewing an action is moving, the motion observed in the video not only contains the motion of the actor but also the motion of the camera. At each time instant, in addition to the camera motion, a different view of the action is observed. In this paper, we propose a novel method to perform action recognition in presence of camera motion. Proposed method is based on the epipolar geometry between any two views. However, instead of relating two static views using the standard fundamental matrix, we model the motions of independently moving cameras in the equations governing the epipolar geometry and derive a new relation which is referred to as the ‘‘temporal fundamental matrix.’’ Using the temporal fundamental matrix, a matching score between two actions is computed by evaluating the quality of the recovered geometry. We demonstrate the versatility of the proposed approach for action recognition in a number of challenging sequences



Eraldo Ribeiro and Mubarak Shah, “Computer Vision for Nanoscale Imaging", Machine Vision and Applications Journal, Vol. 17, Issue 3 (July 2006), pp. 147 - 162.


The main goal of Nanotechnology is to analyze and understand the properties of matter at the atomic and molecular level. Computer vision is rapidly expanding into this new and exciting field of application, and considerable research efforts are currently being spent on developing new image-based characterization techniques to analyze nanoscale images. Nanoscale characterization requires algorithms to perform image analysis under extremely challenging conditions such as low signal-to-noise ratio and low resolution. To achieve this, nanotechnology researchers require imaging tools that are able to enhance images, detect objects and features, reconstruct 3D geometry, and tracking. This paper reviews current advances in computer vision and related areas applied to imaging nanoscale objects. We categorize the algorithms, describe their representative methods, and conclude with several promising directions of future investigation.



Lisa Spencer, Rattan Guha and Mubarak Shah, “Determining Scale and Sea State from Water Video", IEEE Transactions on Image Processing, Vol. 15, No. 6, 2006.


In most image processing and computer vision applications, real-world scale can only be determined when calibration information is available. Dynamic scenes further complicate most situations. However, some types of dynamic scenes provide useful information that can be used to recover real-world scale. In this paper, we focus on ocean scenes and propose a method for finding sizes in real-world units and the sea state from an uncalibrated camera. Fourier transforms in the space and time dimensions yield spatial and temporal frequency spectra. For water waves, the dispersion relation defines a square relationship between the wavelength and period of a wave. Our method applies this dispersion relation to recover the real-world scale of an ocean sequence. The sea state—including the peak wavelength and period, the wind speed that generated the waves, and the wave heights—is also determined from the frequency spectrum of the sequence combined with stochastic oceanography models. The process is demonstrated on synthetic and real sequences, validating the results with known scene geometry. This has wide applications in port monitoring and coastal surveillance.



Mubarak Shah, Omar Javed and Khurram Shafique, “Automated Visual Surveillance in Realistic Scenarios”, IEEE Multimedia Magazine, January-March, 2007.


In this article, we present Knight, an automated surveillance system deployed in a variety of real-world scenarios ranging from railway security to law enforcement. We also discuss the challenges of developing surveillance systems, present some solutions implemented in Knight that overcome these challenges, and evaluate Knight’s performance in unconstrained environments.