Second International Workshop on Visual Analysis and Geo-Localization of Large-Scale Imagery

In conjunction with CVPR 2013, Portland, Oregon June 23, 2013


/CVPR logos


Call for Papers | Submission | Committees | Invited Talks | Program | Past Events

Recent availability of large scale geo-tagged images on social and photo-sharing websites has spurred a growing interest in developing automatic methods for image geolocalization. The problem of visual analysis and geo-localization of large-scale imagery arises in a variety of real-world applications. Consumers of imagery may be interested in determining when and where an image/video was taken, who is in the image, what the different objects in the depicted scene are, and how they are related to each other. Local government agencies may be interested in using large-scale imagery to automatically obtain and index useful geographic and geological features and their distributions in a region of interest. Similarly, local businesses may utilize content statistics to target their marketing based on the ‘where’, ‘what’, and ‘when’ that may automatically be extracted during visual analysis and geo-localization of large-scale imagery.

We believe research on visual analysis and geo-localization of large-scale imagery can greatly benefit by bringing together researchers from areas of computer vision, computer graphics, photogrammetry, computational optimization, geographic information systems, and other related fields. Such a gathering will lay down a foundation for an integrated approach for building earth-scale models and their exploitation for automatic geolocalization. In addition, complementary viewpoints and techniques from these diverse areas will provide additional insight into the problem domain and spur new research directions. The focus will be on exchange of ideas on how to develop visual analysis and geo-localization capabilities that make use of vast amount of contextual information available on the internet. As a byproduct, the relevant communities will also benefit as this workshop will lead to improved methods for data-driven modeling and analysis of large-scale imagery. Papers describing novel and original research are solicited in the areas related to visual analysis and geo-localization of large-scale imagery. Topics of interest include but not limited to:

  • Visual Feature and Information Extraction from Large-Scale Imagery
  • Understanding and Modeling Uncertainties in Visual and Geospatial Data
  • Semantic Generalization of Visual and Geospatial Data
  • Representation, Indexing, Storage, and Analysis of City-to-Earth Scale Models
  • Automated 3D Modeling Pipelines for Complex Large-Scale Architectures
  • Integrated Processing of Point Clouds, Image, and Video Data
  • Multi-Modal Visual Sensor Data Fusion
  • Control Mechanisms that aid in Visual Analysis and Geo-Localization
  • Rendering and Visualization of Large-Scale Models, Semantic Labels and Imagery
  • Applications of Visual Analysis and Geo-Localization of Large-Scale Imagery
  • Datasets and Model Validation Methods for Analysis and Geo-localization Research

Call for Papers 



Papers should describe original and unpublished work about the above or closely related topics. Each paper will receive double blind reviews, moderated by the workshop chairs. Authors should take into account the following:

  • All papers must be written in English and submitted in PDF format.
  • Papers must be submitted online through the CVPR submission CMT system.
  • The maximum paper length is 6 pages, with the option of purchasing 2 additional pages. The workshop paper format guidelines are the same as the Main Conference papers.
  • Submissions will be rejected without review if they: contain more than 8 pages, violate the double-blind policy or violate the dual-submission policy.
  • Authors will have the opportunity to submit up to 10MB of supporting material.

The author kit provides a LaTeX2e template for submissions, and an example paper to demonstrate the format. Please refer to this example for detailed formatting instructions.

A paper ID will be allocated to you during submission. Please replace the asterisks in the example paper with your paper's own ID before uploading your file.

Important Dates

Submissions deadline:
Author notification:
March 22, 2013
April 22, 2013
May 3, 2013
June 23, 2013



General Chairs

Mubarak Shah
Luc Van Gool
ETH, Switzerland


Program Chairs

Asaad Hakeem
Jan-Michael Frahm
Alexei Efros
Khurram Shafique
Omar Javed
ObjectVideo, USA
ObjectVideo, USA
SRI Sarnoff, USA

Program Committee

Shih-Fu Chang
Rama Chellappa
Saad Ali
Robert Pless
Andrew Bagnell
Yaser Sheikh
Nathan Jacobs
Himaanshu Gupta
Arslan Basharat
Serge Belongie
David Crandall
Zeeshan Rasheed
Anthony Hoogs
Jana Kosecka
Daniel Huttenlocher
B. S. Manjunath
Antonio Torralba
Alper Yilmaz
Grant Schindler
Ram Nevatia
Mei Han
Yanlin Guo
Victor Tom
Michael Tarnowski
Amir Roshan Zamir
Columbia Univ., USA
SRI Sarnoff, USA
Washington Univ., USA
Univ. of Kentucky, USA
ObjectVideo, USA
Kitware, USA
Indiana Univ., USA
ObjectVideo, USA
Kitware, USA
Cornell Univ., USA
Georgia Tech, USA
Google Research, USA
BAE Systems, USA


Marc Pollefeys
Cordelia Schmid
James Hays
Josef Sivic
Noah Snavely
Yang Song
ETH, Switzerland
INRIA, France
Brown University, USA
INRIA, France
Cornell University, USA
Google Research


More invited talk title and abstract details to be announced.

Visual geo-localization in urban and mountainous regions
Marc Pollefeys, ETH Zurich

Abstract: In this talk I will present our results on geo-localization from images. First, I will focus on urban areas where facades provide rich visual information, which can be used effectively for geo-localization. Then, we will move to more challenging terrain, in particular mountaineous areas, and show that here also visual geo-localization can be achieved. I will also talk about some efforts in indoor navigation assistance.

Short bio: Marc Pollefeys is a full professor in the Dept. of Computer Science of ETH Zurich since 2007 where he is the head of the Institute for Visual Computing and leads the Computer Vision and Geometry lab. He currently also remains associated with the Dept. of Computer Science of the University of North Carolina at Chapel Hill where he started as an assistant professor in 2002 and became an associate professor in 2005. Before this he was a postdoctoral researcher at the Katholieke Universiteit Leuven in Belgium, where he also received his M.S. and Ph.D. degrees in 1994 and 1999, respectively. His main area of research is computer vision. One of his main research goals is to develop flexible approaches to capture visual representations of real world objects, scenes and events. Dr. Pollefeys has received several prizes for his research, including a Marr prize. He is the author or co-author of more than 200 peer-reviewed publications. He is the General Chair for the European Conference on Computer Vision 2014 (ECCV) and was a Program Co-Chair for the IEEE Conference on Computer Vision and Pattern Recognition 2009 (CVPR) and several 3D conferences. Prof. Pollefeys is on the Editorial Board of the International Journal of Computer Vision and was a associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He is a Fellow of the IEEE.
Learning representations for visual place recognition
Josef Sivic, INRIA/Ecole Normale Superieure, Paris

Abstract: We consider the problem of visual place recognition: given the query image of a particular street or a building facade, the objective is to find one or more images in the geotagged database depicting the same place and estimate the camera location of the query.

First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per exemplar SVMs in object recognition.

Second, we describe a representation of repeated image structures based on a simple modification of their weights in the bag-of-visual-word model and show that appropriate weighting of repeated image elements can significantly improve place recognition performance.

Finally, we describe a compact representation of 3D scenes, where an entire architectural site is represented by a small set of discriminative visual elements that are automatically learnt from rendered views. We demonstrate the learnt 3D visual elements can be used to match and localize historical and non-photographic imagery where the standard image representations based on local invariant features fail.

Results will be shown on collections of Internet images from Google street-view and Flickr as well as historical and non-photographic imagery.

Joint work with M. Aubry, P. Gronat, G. Obozinski, M. Okutomi, T. Pajdla, B. Russell and A. Torii.

Short bio: Josef Sivic received a degree from the Czech Technical University, Prague, in 2002 and the PhD degree from the University of Oxford in 2006. His thesis dealing with efficient visual search of images and videos was awarded the British Machine Vision Association 2007 Sullivan Thesis Prize and was short listed for the British Computer Society 2007 Distinguished Dissertation Award. His research interests include visual search and object recognition applied to large image and video collections. After spending six months as a postdoctoral researcher in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology, he currently holds a permanent position as an INRIA researcher at the Departement d’Informatique, Ecole Normale Superieure, Paris. He has published over 40 scientific publications and serves as an Associate Editor for the International Journal of Computer Vision.
Place Graphs
Noah Snavely, Cornell University

Abstract: At the last workshop on geolocation at ECCV 2012, an interesting theme emerged on the representation of places -- from sets of images, to images with relations, to 3D models, to sets of geographically distinctive elements. In our recent work, we have been exploring graphs on images -- where images are nodes with visual connections -- as a more structured way to represent locations. We show how combining these graphs with discriminative learning techniques can yield better performance for location recognition.

Short bio: Noah Snavely is an assistant professor of Computer Science at Cornell University, where he has been on the faculty since 2009. He received a Ph.D. in Computer Science and Engineering from the University of Washington in 2008. Noah works in computer graphics and computer vision, with a particular interest in using vast amounts of imagery from the Internet to reconstruct and visualize our world in 3D. His work was the basis for Microsoft's Photosynth, a tool for building 3D visualizations from photo collections that has been used by many thousands of people. Noah is the recipient of a Microsoft New Faculty Fellowship and an NSF CAREER Award, and has been recognized by Technology Review's TR35.



09:00-09:05 Opening notes from Workshop Organizers
09:05-09:40 Invited Speaker - Noah Snavely (Cornell)
09:40-10:15 Invited Speaker - Marc Pollefeys (ETH)
10:15-10:45 Break
10:45-11:20 Invited Speaker - James Hays (Brown)
11:20-11:55 Invited Speaker - Cordelia Schmid (INRIA)
12:00-13:30 Lunch
13:30-13:50 Oral Presentation - 3D Point Cloud Reduction using Mixed-integer Quadratic Programming
13:50-14:10 Oral Presentation - User-Driven Geolocation of Untagged Desert Imagery Using Digital Elevation Models
14:10-14:45 Invited Speaker - Yang Song (Google Research)
14:45-15:20 Invited Speaker - Josef Sivic (INRIA)
15:25-15:55 Break
16:00-17:00 Panel Discussion - Rick Szeliski, Avideh Zakhor, Jiri Matas, Jana Kosecka



First Workshop on Visual Analysis and Geo-Localization of Large Scale Imagery (with ECCV'2012)

/ECCV logos