This is a high level computer vision paper, which em-
ploys concepts from Natural Language Understanding in
solving the video retrieval problem. Our main contribution
is the utilization of the semantic word similarity measures
(Lin and PMI-IR similarities) for video retrieval. In our
approach, we use trained concept detectors, and the visual
co-occurrence relations between such concepts. We pro-
pose two methods for content-based retrieval of videos: (1)
A method for retrieving a new concept(a concept which is
not known to the system, and no annotation is available) us-
ing semantic word similarity and visual co-occurrence. (2)
A method for retrieval of videos based on their relevance to
a user defined text query using the semantic word similarity
and visual content of videos. For evaluation purposes, we
have mainly used the automatic search and the high level
feature extraction test set of TRECVID’06 benchmark, and
the automatic search test set of TRECVID’07. These two
data sets consist of 250 hours of multilingual news video
captured from American, Arabic, German and Chinese TV
channels. Although our method for retrieving a new con-
cept is an unsupervised method, it outperforms the trained
concept detectors (which are supervised) on 7 out of 20 test
concepts, and overall it performs very close to the trained
detectors. On the other hand, our visual content based se-
mantic retrieval method performs 81% better than the text-
based retrieval method. This shows that using visual con-
tent alone we can obtain significantly good retrieval results.
Semantic Video Retrieval Using High Level Context by Aytar, Yusuf, M.S., University of Central Florida, 2008, 65 pages. |