Abstract:
Over the last few years, there has been a flurry of approaches tackling the problem of urban modeling. Digital mapping of existing cities is reaching new heights as users can now browse detailed 3D models of cities instead of flat maps. In the entertainment industry, particularly for movies and games, there is an ever rising need for detailed and realistic models of virtual cities. Manual modeling of individual buildings usually provides good results, but the process is very time consuming and expensive. Current automatically-built models using Structure from Motion, followed by simple plane fitting and texturing are a good starting point, but provide inadequate 3D visual perception. For instance, artifacts in the 3D shape often show up during unrestricted user movement around the model. Due to diversity of appearance, hierarchical structure of scene objects and the lack of implementing long-range interactions, it appears impossible that improved, bottom-up depth extraction and primitive fitting alone can avoid such artifacts from sneaking in. Furthermore, conventional bottom-up models lack any semantic knowledge about the scene. Yet, adding a good understanding of what it is that needs to be modeled is a strong cue, not only to improve the visual and 3D quality of the model, but also to substantially widen its usage (e.g. for animation where people should walk through doors, not walls, when wanting to know the average number of floors that the buildings in a street have, etc.). Conversely, procedural modeling provides an effective way to create detailed and realistic 3D building models that do come with all the semantic labels required. In urban procedural modeling, the knowledge of the building style and layout is most commonly encoded as a shape grammar, i.e. a set of parametric rules, where each rule adds more detail to the result of the previous. The models are typically generated by iteratively applying procedural shape grammar rules on a starting shape, e.g. a building footprint. A specific building can then be represented as a particular derivation, or a parse tree of that grammar. The resulting models support the addition of visually crucial effects such as window being reflective, balconies to protrude, etc. The goal of creating procedural models for existing buildings from images or other data thereof, has been coined inverse procedural modeling (IPM). These methods need to select the appropriate rules from the style grammar, as well as their parameter settings. The first part of the talk presents one such approach, namely the automatic 3D reconstruction of Greek Doric temples based on object detectors and a procedural style grammar. Afterwards, we discuss a strategy for automatic grammar selection based on the detected architectural style of the building. As the corresponding search space is huge, IPM methods typically start from a pre-processed version of the raw data. The semantic segmentation of facades - also referred to as facade parsing - is a good example. This said, such accurate labeling of facade elements (such as windows, doors or balconies) is a difficult problem in its own right, given the great diversity of buildings and the interference of factors like shadows, occlusions and reflections in the images. Furthermore, creating style-specific grammars is a tedious and time-consuming process, which is usually performed only by a few experts in the field. The second part of the talk presents a novel approach which avoids the need for style-specific grammars and uses generic architectural principles instead. In contrast to full procedural grammars, these principles do not encode the entire facade structure and can be formulated explicitly by laymen. By avoiding the need for a style prior, manual construction of style-specific grammars can be circumvented.The last part of the talk demonstrates how procedural rules and thus shape grammars can be derived from facade labeling, rather than vice-versa. Given a set of labeled positive examples, procedural grammar learning is performed using Bayesian Model Merging. This technique, originally developed in the field of natural language processing, is extended to the domain of two-dimensional languages. The induced grammar can then be used for parsing existing facade imagery, or sampled to create novel buildings of the same architectural style.
CV:
Anđelo Martinović was born in Zadar, Croatia in July 1987. He received the degree of Master of Science in Computing, with highest honor, from the Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia in July 2010. He was awarded the bronze plaque "Josip Lončar" as one of the best students in his class. In September 2010, he joined the VISICS (Vision for Industry, Communications and Services) research group at the Department of Electrical Engineering, KU Leuven, Belgium as a predoctoral student. In September 2011, he started his Ph.D. programme with the topic of "Inverse procedural modeling of buildings". As a research associate, he has been involved in FP7 projects dealing with urban reconstruction (V-City, VarCity) and cultural heritage (3D-COFORM). His main research interests include semantic city modeling, shape grammars and inverse procedural modeling of architecture. Furthermore, he is interested in various machine learning techniques applied to semantic scene segmentation and image parsing. So far, he has contributed several research papers in high-tier international computer vision conferences such as ECCV and CVPR. He is also a teaching assistant for the course 'Digital Electronics and Processors'.