MIT 4.550/4.570 Computation Design Lab

MIT Spring 2024
Final project for 4.550/4.570 Computation Design Lab
Instructors: Takehiko Nagakura, Daniel Tsai
TA: Chili Cheng

Semantic Landscape

Meng-Lun Tu

A black square with white text

Description automatically generated

Abstract

This project aims to develop an interactive map that associates descriptive prompts with spatial qualities for various applications. By integrating advanced technologies such as zero-shot object detection, customized analysis, and spatialized ordinance, the project utilizes models like Grounding DINO, BERT, and Swin Transformer to process and analyze data from satellite images, street views, maps, and social media sources. The pipeline involves collecting and spatializing data, performing object detection and accuracy ranking, and generating an interactive map that allows users to find specific types of locations based on detailed prompts. The vision is to enhance the map with diffusion models for providing location-based images, making it a powerful tool for real-time analysis and dynamic updates. The project's ultimate goal is to offer a user-friendly platform that delivers semantically relevant information, facilitating the discovery of specific places with desired characteristics.

Project Overview

Objective: Develop an interactive map that presents descriptive prompts and associates them with spatial qualities.

Components: Combines zero-shot object detection, customized analysis, spatialized ordinance, and semantic landscape integration.

A collage of a lion

Description automatically generated

The project leverages a range of key technologies and models to achieve its goals. Zero-shot object detection is implemented using Grounding DINO, enabling the system to perform open-set object detection without the need for prior category-specific training. For customized analysis, the project employs advanced NLP models such as BERT and Swin Transformer, which are used to generate bounding boxes and segment objects based on input prompts, allowing for a more nuanced understanding of spatial data. Additionally, the spatialized ordinance component utilizes tools like NLTK, WordNet, Word2Vec, and GLIP (Grounded Language-Image Pre-training) to spatialize and tokenize data collected from diverse sources, including satellite images, maps, and social networking services (SNS). These technologies work in tandem to create a rich, interactive semantic landscape that can be explored through the project's innovative mapping interface.

Pipeline and Workflow

A diagram of input prompt

Description automatically generated

Input Data: Includes satellite images, street views, maps, and social media (SNS) data.

A close-up of a sign

Description automatically generated

The pipeline of this project involves a comprehensive data processing pipeline that begins with the collection of data from specified areas of interest. This data is then spatialized and tokenized to prepare it for further analysis. Object detection is conducted, followed by accuracy ranking to ensure the reliability of the detected objects. The next step involves quantity analysis and the generation of an interactive map, which serves as the primary interface for users to explore and interact with the data. The vision for the project includes integrating diffusion models to generate location-based images, thereby enhancing the user’s interaction with the map and providing a more immersive experience.

The image provided shows two figures (Fig.A and Fig.B) that represent the results of applying the project's methods to a specific prompt: "A place where people can hide from the sun." Fig.A illustrates a broader view, with numerous bounding boxes and segmentation outlines covering various parts of an urban landscape, highlighting multiple potential areas of interest based on the given prompt. The threshold values for this analysis are set at 0.25 for both text (T) and bounding boxes (B). Fig.B zooms in on a specific area that has been identified as highly relevant to the prompt, with a more refined bounding box highlighting a particular location where people might find shelter from the sun. Here, the thresholds are set at 0.5 for both text and bounding boxes, indicating a stricter selection criteria for identifying this location. A comparison of a map of a city

Description automatically generated with medium confidence

The followings showcases two key processes in the project. On the left, "Deepblock" demonstrates how the system extracts structured information from a zoning report based on an input prompt. On the right, "Real Time Analysis" shows how the system embeds information into an aerial map, highlighting significant areas in real-time based on the same input prompt. These processes illustrate the project's ability to handle both textual and spatial data effectively.

A screenshot of a computer program

Description automatically generated

A diagram of a network

Description automatically generated

The pipeline allows users to search for locations with specific characteristics, such as "finding an indoor place with a lot of windows, or an outdoor space, close to a park and university, with a relaxing and comforting vibe where people can rest, and that is also clean." This capability is achieved through the use of tokenized and spatialized data, which enhances the precision of searches. The system integrates semantic landscape information with tools like Deepblock and real-time analysis, enabling dynamic, context-aware map updates that respond to user inputs. Furthermore, the project leverages GLIP training sets for grounding language-image pre-training, ensuring that the system's ability to link language with visual data is both robust and adaptable to various training and analysis scenarios.

A close-up of a text box

Description automatically generated

The following diagram outlines a workflow where spatial data from a specific area is collected via Google Maps API, comments are scraped and cleaned using NLTK, and then a model is trained to find synonyms of input prompts using Word2Vec. This process transforms raw location data into meaningful insights, enhancing the semantic analysis of spatial data. A diagram of a program

Description automatically generated

Word2Vec and WordNet enrich the original input prompt by expanding the vocabulary and context associated with the search, allowing the system to more effectively locate and identify places that meet the specified criteria. This approach enhances the system's ability to understand and process natural language inputs, making it more adaptable and responsive to user queries.

A black and white text

Description automatically generated

This project aims to combine various advanced technologies and data sources to create a rich, interactive map that can provide users with detailed and semantically relevant information about different locations.

A map with blue dots

Description automatically generated

How does the system determine the most appropriate scenarios to visualize in a space like the courtyard garden in Mumbai? It’s fascinating to see how the project adapts to reflect different uses of the same space, capturing both tranquil moments and lively events. This ability to dynamically shift between various contexts highlights the flexibility and depth of semantic landscape analysis. Could this approach be extended to predict and visualize future changes or trends in community spaces? In the future, we may explore and refine these capabilities further to achieve that goal. A view of a garden from a balcony

Description automatically generated

Conclusion of the Project:

The project successfully integrates advanced semantic analysis, natural language processing, and real-time spatial data processing to create an interactive and contextually aware mapping interface. By utilizing cutting-edge technologies like Grounding DINO for zero-shot object detection, BERT for NLP-based analysis, and GLIP for language-image pre-training, the system can dynamically generate, and update maps based on user input. This allows for highly customizable searches and visualizations, enabling users to explore and interact with semantic landscapes in innovative ways. The project not only demonstrates the potential for integrating semantic and spatial data but also provides a robust platform for future developments in dynamic urban analysis, planning, and community engagement.