Scene
Understanding by Statistical Modeling of Motion Patterns
Related Publication: Imran
Saleemi, Lance Hartung, and Mubarak Shah, Scene
Understanding by Statistical Modeling of Motion Patterns, IEEE Conference
on Computer Vision and Pattern Recognition 2010, San Francisco, CA.
Abstract:
We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic.
I.
Problem
¨ Video sequence:
¤ Static camera
¤ Structured scene
¤ High density crowds
¤ Multiple flows
¨ Goal:
¤ Learn patterns of motion
¤ Statistical distribution
¨ Applications:
¤ Anomaly detection, prior motion model, persistent tracking
Figure: Examples of scenes to be analyzed and desirable patterns
II.
Gaussian Mixture
Formulation
¨ Compute optical flow
¨ Define
¨ A single Gaussian approximates a motion blob
III.
Process
¨ Temporal quantization
¨ K-means clustering in 4d space
¨ No optimization
¨ Insensitive to choice of K
¨ Numerous, low variance clusters
¨ Optical flow is noisy
¨ Filter high directional variance components
¨ Sequences of components form spatiotemporal worms (instances)
¨ Pattern instances are temporally bounded
¨ A pattern itself is periodic
¨ Pattern instance occurs over several clips
¨ Two components i and j form an instance if,
¤ i and j are temporally proximal,
¤ j is `reachable’ from i
¨ Define a planar graph G = (V, E)
¤ V = { components from all video clips }
¤ E = { probability value if temporally proximal }
¨ Weak connected component analysis on G
¨ Connected components are pattern instances
Figure: Left: One instance each from 4 patterns. Right: More instances for each of the 4 patterns.
¨ Multiple instances per pattern
¨ Each instance is a Gaussian mixture
¨ KL divergence defines similarity between instances
¨ Approximate with Monte Carlo sampling
¨ Graph connected analysis
¨ Compute conditional expected orientation / magnitude given a pixel
IV.
Experiments