“Where Am I?!”
Accurate Image
Localization Based on Google Maps Street View
Amir Roshan Zamir, Mubarak Shah
Related Papers:
1.
Amir Roshan Zamir, Mubarak Shah, “Accurate Image Localization Based
on Google Maps Street View”,
European Conference on Computer Vision (ECCV), 2010, [PDF], [BibTeX] – Winner of ECCV’10 Travel Grant
Note: This version contains minor typographical
corrections over the version published in the ECCV10 proceedings.
2.
Gonzalo Vaca, Amir Roshan
Zamir, Mubarak Shah, “City Scale Geo-spatial
Trajectory Estimation of a Moving Camera”,
25th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2012
DATASET: Please
contact us if you are interested in obtaining an expanded
version of our Street View dataset and test set.
CODE: Please email
us your name and affiliation in order to obtain the code.
PRESENTATION:
The power point presentation of the paper is available here.
The poster is available here.
·
About
In this project a new system for image localization and
location recognition in terms of Longitude () and Latitude () with an accuracy which is comparable to hand held
GPS-devices is proposed. Our geolocation system is based on Google
Maps Street view.
·
Google Maps Street View Reference Dataset
The reference dataset if based on Google Maps Street
View. There are ~100k images in the dataset collected from Pittsburgh, PA and
Orlando, FL. ~50k of the reference images are Collected automatically from
Street View website and the rest of them are provided by Google.
Reference
Image Place marks in Green and Query Images in
Red
Note: the dataset set provided by Google and the automatically
captured images are overlapping in location and captured at different times.
There are 4 side view and 5 top views per place mark.
The following figure shows the images of 3 sample place marks in Pittsburgh,
PA.
Sample
Reference Image and Their Location on the Map
In order to preprocess the reference dataset, the SIFT
descriptors for SIFT interest points are computed and saved in a K-means tree
using FLANN
along with their GPS Tag.
·
Single Image Localization
The following block diagram shows each step of geolocating
a query image. The input to the system is an image and the output is the found
GPS location in terms of Longitude () and Latitude ().
First
Row: Single Image Localization Block Diagram, Second Row: The result of each
step for a sample query.
Geospatial Pruning and Smoothing steps are explained in
more details in the following sections.
o 3.1 Geospatial Pruning
Geospatial Pruning is an essential step in the proposed
geolocation system. Geospatial Pruning is helpful when reference images have
overlap in scene (i.e. one object is in the view of several reference images) and
when there are repeated structures such as man-made structures in urban areas
(e.g. the windows of a skyscraper are almost identical). Regarding the
following figures, using Lowe’s
pruning method most of the detected interest points and their descriptors will
be removed in the pruning step resulting in a sparse vote distribution which is
not appropriate for a reliable geolocation. On the other hand, using the
proposed geospatial pruning method which incorporates the GPS location of each
reference descriptor, the incorrectly matched descriptors are removed while the
repeated structures of urban area and overlap in reference images do not affect
the pruned results adversely.
Geospatial
Pruning: Two sample correctly-matched descriptors
Geospatial
Pruning Equation
Regarding the above figures which show two sample
correctly-matched descriptors, using the Lowe’s pruning method
the two interest points will be removed from voting (descriptor ratio will be
between 1st NN and 2nd NN) while the proposed method will
retain them (descriptor ratio will be between 1st NN and 4th
NN).
o 3.2 Smoothing
Since Street View place marks are about 12 meters away
from each other one object in a query image might be in the view of several
reference images. This results in several short close peaks instead of one tall
peak for the correct location in the vote function. Also, there might be some
solitary peaks in the vote function which are due to incorrectly-matched
descriptors. In order to amplify several close peaks and attenuate solitary
peaks, the vote function is smoothed by Gaussian using the following figure.
Smoothing
By Gaussian
o 3.3 Confidence of Localization (CoL)
A parameter called Confidence of Localization (CoL) which represents the reliability of localizing a query
image is proposed in this project. The vote distribution after Gaussian
smoothing can be normalized and treated as a Probability Distribution Function
with the random variables of Longitude () and Latitude (). The proposed parameter is based on the Kurtosis
(normalized forth central moment) of the vote distribution. This is due to the
facts that a more peaked vote distribution function is corresponding to a more
reliable localization task and the Kurtosis is a measure of how peaked a PDF
is. The following figure shows how CoL changes
with respect to vote distribution.
Confidence
of Localization (CoL)
CoL value
is not limited since there is no upper limited to the Kurtosis of a PDF, so CoL makes more sense when used on a
comparative basis. For instance in order to find the correct city for a query
image among two different cities, the query image can be geolocated within each
one and the city with the higher associated CoL
value should be selected as the correct one.
·
Image Group Localization
We propose a method for geolocating a group of query
images instead of geolocating them individually. The proposed method leverages
the adjacency information of query images in geolocating them. The assumption
of the proposed method is that the query images are taken within a distance
from each other (e.g. 300 meters). The following figure shows different steps
of the proposed method for a group of 3 query images. First, each query image
is geolocated individually. Later the other query images are geolocated within
the neighborhood of the found location. The correct neighborhood and associated
locations for each query image is neighborhood with the highest CoLgroup value.
Image
Group Localization
·
Results
We test the proposed method on a test set of 521
GPS-Tagged user-uploaded images downloaded from Flickr, Panoramio,
Picasa, etc. Since the GPS-tags of user-uploaded image are usually very noisy
and inaccurate, we have manually double checked and adjusted the GPS location
of the test set images.
The following figures show the results of geolocating
the test set images using the proposed methods. The vertical axis shows the
percentage of the test set images geolocated within the distance threshold
(horizontal axis) of the ground truth.
Single Image Localization Results Image Group Localization Results
In order to examine the
performance of the proposed Confidence of Localization parameter, the CoL values of geolocating the test set of single
image localization are grouped into 8 bins based on their CoL
value. The following figure shows the mean error (vertical axis) of each bin
versus the mean CoL value of the bin
(horizontal axis). As can be observed in the figure, higher CoL
values are corresponding to lower error meaning the localization is more
reliable.
Confidence of Localization vs.
Geolocation Error (m)
(Since theoretically the value of the
Kurtosis is not limited, we normalized the CoL
values and showed them ranging from 0 to 1 on the horizontal axis of the plot.)
The following figures are more localization
examples of the test set image using the proposed methods. Each example shows
one query image, the retrieved image, their GPS locations and the error in
meters along with the vote distribution for each step of localization.
Please feel free to contact us with your questions,
suggestions and comments.