[Temporal
Video Segmentation]
[Video Content Processing
and Understanding]
[Content-Based Image/Video
Retrieval]
Temporal Video Segmentation [back to top]
Video Content Processing and Understanding [back to top]
Spatiotemporal
Visual Attention Detection Human vision system actively seeks interesting regions in images/videos to reduce the search effort in the object detection tasks. Similarly, prominent actions in video sequences are more likely to attract human's first sight than their surrounding neighbors. In this project, we have developed a spatiotemporal video attention detection framework for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Homographies are estimated between video frames and used to detect the motion saliency, and a hierarchical structure is constructed for the color-based spatial visual saliency computation. Both temporal and saliency maps are fused in a dynamic way for the generation of the spatiotemporal visual attention model with the bias towards the temporal model. This work has been tested on multiple sequences to highlight the target objects and/or activities. |
|
|
Semantic Linkage of News Stories In this project, we have developed a novel framework for the semantic linking of the news topics. Unlike the conventional video content linking methods based only on the video shots, the proposed framework links the news stories across different sources. The semantic linkage between the news stories is computed based on their visual and textual similarities. The visual similarity is carried on both of the story key-frames with or without faces detected. The textual similarity is computed using the automatic speech recognition (ASR) output of the video sequences. The output of the story linking method can be applied to compute the ranking or interestingness of a news story. The developed method has been tested on a large open-benchmark dataset from TRECVID 2003 by NIST, and very satisfactory results for both of the proposed tasks have been obtained. Yun Zhai and Mubarak Shah, "Tracking News Stories Across Different Sources", ACM Multimedia 2005, Singapore, November 6-12. |
|
Movie Scene Classification Using Finite State Machines Among many genres of video production, feature films are a vital field for the application of such tools. Feature films are produced in accordance with the “film grammars”, which is a set of rules of how the films should be generated to reveal the story lines. In this work, we utilized the knowledge of film grammars, and modeled the movie scenes using the Finite State Machines. Three scene categories are modeled, action, dialog and suspense scenes. This method analyses the structural information of the scenes based on the low-level and mid-level features. The presented framework has demonstrated the usefulness of FSM by experimenting on over 80 movie scenes and has achieved high accuracy scores, including recall and precision scales. |
Content-Based Image/Video Retrieval [back to top]
Relevance
Feedback Using Keyword and Region-Based Refinement In this project, we have developed an on-line content-based video retrieval system, PEGASUS. It retrieves relevant video shots from news database according to user queries. The developed system indexes the video data using speech (using ASR) and images (key-frame regions) features. In addition, this system provides a relevant feedback mechanism based on query expansion and region-based image matching to allow users to refine their search results through an iterative process. The PEGASUS system has been constructed using more than 43,000 video shots of news programs. This system can be accessible from the internet at the address of http://pegasus.cs.ucf.edu:8080. |
|
|
TRECVID is an annual research forum organized by the US National Institute of Standards and Technologies (NIST). It encourages the research in the classification, searching and retrieval of news video data, and provides a worldwide communication platform for researchers to share their thoughts. We, UCF vision team, have participated in the tasks of shot boundary detection, news story segmentation, camera motion classification, high-level semantic concept detection, interactive topic search and TV rushes exploitation. I was personally deeply involved in all the tasks except the high-level feature detection. We have achieved top performance in the feature “beach” detection and story segmentation tasks. |