Keyframe Extraction

Benjamin Mears Amherst College Logo

Keyframe Extraction

TRECVID provides pariticipants with the shot boundary annontations \cite{ShotBoundaries}. Yet, there can be much intershot variation and frames containing the desired concepts can occur anywhere within a shot. While many past TRECVID submissions have used the shot and subshot boundaries to choose keyframes, such as sampling the middle frame , others have used various keyframe extraction algorithms.

We chose to implement a clustering algorithm based on the work of Zhuang, et al ("Video key frame extraction by unsupervised clustering and feedback adjustment"). The frames in each shot are clustered based on HS color histograms in the HSV colorspace. In the clustering algorithm, frames are sequentially analyzed and assigned to the nearest cluster. If the similarity between the frame and its nearest cluster center is below a certain threshold, a new cluster is created with the current frame as its centroid. After the clustering is completed, representative frames are then chosen from each cluster with a size above a threshold.

The algorithm we implemented has various parameters, such as the thresholds to start a new cluster and to judge a cluster as being important, that control the number of keyframes extracted. Yet, as more frames are extracted, the computation time increases. In all, we extracted approximately 300,000 keyframes from the 280 hours of test video.