MIT 4.550/4.570 Computation Design Lab

MIT Spring 2024
Final project for 4.550/4.570 Computation Design Lab
Instructors: Takehiko Nagakura, Daniel Tsai
TA: Chili Cheng

ChatGPT 4.0 Criticism and Fine-Tuning for Describing Light in Architectural Contexts

Muwen Li | Yining Bei

A low angle view of a dome

Description automatically generated

Abstract

This project focuses on enhancing ChatGPT's ability to describe lighting within architectural contexts by addressing its current limitations in interpreting and narrating the interplay of light and architectural spaces. The project establishes a comprehensive framework to develop a specialized dataset and fine-tuning pipeline, beginning with the identification of ChatGPT’s weaknesses and the construction of a dataset sourced from professional architectural materials, including websites, articles, books, and magazines. The ultimate objective is to fine-tune ChatGPT, improving its descriptive accuracy and contextual understanding of lighting in architectural images. A key application envisioned is a specialized search platform for architects, enabling precise retrieval of images based on specific lighting characteristics, thereby facilitating design inspiration and detailed architectural research. The project includes a rigorous evaluation of ChatGPT’s performance across diverse architectural examples, highlighting both its strengths and areas for improvement. Notably, while ChatGPT can accurately classify indoor and outdoor spaces and serve as a useful filler for detailed links, it struggles with specific architectural terminology, detailed recognition, and contextual accuracy. To overcome these challenges, the project proposes a refined dataset preparation pipeline, incorporating advanced filtering techniques and leveraging the CLIP model for enhanced text-image correlation. This abstract encapsulates the project's ambition to refine ChatGPT into a more reliable tool for architectural narration, particularly in the nuanced domain of lighting, thereby supporting and advancing architectural design and research.

Goal

The project aims to enhance ChatGPT's ability to describe lighting in architectural images by first identifying its weaknesses and limitations in this specific context. This involves testing the model on various types of images, including real versus fake, human versus machine-generated descriptions, and different views of the same space. Following this, a pipeline will be developed to create a dataset from professional architectural sources such as websites, articles, books, and magazines, with the goal of compiling at least 1,000 image-caption pairs. This dataset will be used to fine-tune ChatGPT, improving its capacity to narrate light in architectural images. The final step will involve evaluating the fine-tuned model, referred to as the "ChatGPT-adapter," to assess its enhanced performance in describing architectural lighting conditions. A screenshot of a computer

Description automatically generated

Pipeline for Dataset Preparation:

1. Scraping Data: Collecting data from various sources, including PDFs (like EL Croquis) and websites (like Archdaily).

2. Filtering Data: Indoor/Outdoor Classifier: Currently, a pretrained classifier categorizes images as 'indoor' or 'outdoor.' However, it inaccurately groups diverse image types under 'indoor.' A more accurate classifier is needed, potentially with an 'other' category.

A screenshot of a web page

Description automatically generated

CLIP Similarity Scoring: Using the CLIP model to evaluate the correlation between image captions and the actual content of the images. Text is truncated around the keyword "light" to ensure relevance and coherence.

A close-up of a page

Description automatically generated

Possible Application

Architectural Lighting Search Platform: This platform allows architects to search for specific lighting environments in images. Users can input keywords related to desired lighting characteristics, and the system retrieves images matching these criteria from an extensive database. Each image is linked to its original website for further information, supporting design inspiration, and detailed architectural research and planning.

A close-up of a table

Description automatically generated

Collected 83 pieces of architecture across Japan from 1970-2020 for analysis.

A close-up of a computer screen

Description automatically generated

Assessed ChatGPT's performance in various architectural contexts, including recognizing and describing lighting conditions, architectural elements, and specific styles.

The Following image showcases an evaluation of ChatGPT 4.0's ability to describe architectural spaces, specifically focusing on a module within the Nakagin Capsule Tower designed by Kisho Kurokawa in 1972. ChatGPT provided a description of how the linear and ambient lighting in the room accentuates the rugged textures and geometric shapes, enhancing the depth and character of the space. When asked to analyze the correlation between lighting and architectural space, ChatGPT suggested that the space features an organic or rustic architectural style influenced by natural design elements. The evaluation concludes with a finding that ChatGPT can effectively serve as a filler for adding links to detailed information, although its architectural analysis might be broad and generalized.

A collage of a room with a table and a statue

Description automatically generated

ChatGPT is also capable of interpreting an architectural sketch, focusing on how lighting interacts with the design elements. ChatGPT effectively identifies the impact of lighting on the building's angular and vertical structures, noting how this creates deep shadows and contrasts that highlight the design's dynamic and fragmented nature. This evaluation showcases ChatGPT’s capability to understand and articulate the relationship between lighting and architectural form, even when presented with a hand-drawn sketch.

A close-up of a document

Description automatically generated

Findings:

ChatGPT as a Filler for Links: It can provide useful links to details but has limitations in specific architectural contexts.

Prediction and Realism: ChatGPT can expand on descriptions but may not always align with reality.

Recognition Limits: It can only recognize rough outlines and not specific details, such as identifying the architect.

Indoor/Outdoor Classification: ChatGPT accurately determines if a space is indoor or outdoor based on visual elements.

Architectural Terminology: ChatGPT struggles with specific architectural terms and their nuanced meanings.

Lighting Interpretation: It can interpret lighting quality based on hand-sketches.

Misidentification Risk: It can mistake specific designers and styles, leading to potential misinformation.

Model vs. Actual Projects: The connection between architectural models and actual buildings is not well understood.

Hand-Drawing Recognition: ChatGPT can recognize hand-drawings in relation to architectural projects.

Conclusion:

The project aims to fine-tune ChatGPT to improve its ability to generate accurate architectural descriptions, with a particular focus on lighting. A key application envisioned is the development of a specialized search platform for architects, enhancing their design inspiration and research capabilities. The evaluation of ChatGPT highlights both its strengths and limitations, providing valuable insights that will guide future fine-tuning efforts and improvements.