The Pulse of the City
-Computer Vision for Urban Micro-Interventions-

Rodrigo Gallardo & Oz Fishman

Final project for the MIT class 4.550/4.570 Computation Design Lab
Development: February-May, 2025
Instructor: Prof. Takehiko Nagakura & Dr. Daniel Tsai

Overview

This research explores how computer vision can support urban design by treating the city as a network of influential objects. Using existing image datasets, we analyze how everyday urban elements - such as benches, trees, trash cans, or bus stops - co-occur in scenes where people already choose to spend time.

Abstract

We begin from the Urban Field framework of elements, tactics, strategies, and systems, and ask: which combinations of elements contribute to successful public spaces, and how might those patterns be reused elsewhere? To investigate this, we use an object detection model (Grounding DINO) on annotated images from the ADE20K dataset to identify key urban objects and compute co-occurrence statistics for each one.From these co-occurrence matrices, we derive "suggestion rules" that map a detected anchor object (for example, a bench) to a ranked set of likely companion elements (such as trees, trash cans, or street furniture). The result is a lightweight recommendation engine for micro-interventions that can be connected to AR or camera-based interfaces. In an urban design workflow, a resident or designer can point a camera at a street scene; the system detects the most prominent urban element and then proposes additional elements, based on learned co-occurrence patterns, that might strengthen the local public realm.

Fig 1. Object Detection & Dataset

Object detection on ADE20K street scenes using a curated vocabulary of urban elements (bench, tree, planter, trash can, sign, lamp, etc).


Fig 2. Co-Occurrence Matrix

Co-occurrence matrix showing how often different urban elements appear together in the same scene. Warmer cells indicate stronger relationships.

Fig 3. Suggestion Rules for Micro-Interventions

Example of suggestion rules derived from co-occurrence data. Given a detected anchor object (e.g., a bench)

Closing-Thoughts

This project treats the city as a field of relationships rather than a collection of isolated objects. By learning from how urban elements already cluster in real scenes, Responsive Urban Field suggests a way to ground design decisions in observed patterns instead of abstract rules. At the same time, the work exposed clear limits: co-occurrence data cannot capture culture, power, or lived experience, and any recommendation system risks reinforcing existing biases in what gets photographed and labeled as "good" public space. The most promising direction, therefore, is not a fully automated urban designer, but a shared interface where residents, designers, and municipalities can see the same suggestions, annotate and contest them, and gradually build a more responsive, situated understanding of what makes everyday spaces work.

Further Development

Neurips Publication: https://arxiv.org/abs/2511.06201


References

ADE20K dataset for annotated urban scenes.

Grounding DINO and related object detection models for extracting urban elements

Python / Jupyter workflow for computing co-occurrence matrices and suggestion rules.

 


several
2022 All rights reserved.