MIT Spring 2024
Final project for 4.550/4.570 Computation Design Lab
Instructors: Takehiko Nagakura, Daniel Tsai
TA: Chili Cheng
Textual Architectures: Exploring Design Through the Manipulation of Words
Jingfei Huang
Abstract
This project explores an innovative approach to architectural design using textual descriptions as a universally accessible and easily modifiable medium. Traditional methods of architectural communication, reliant on visual media such as drawings and models, often require specialized skills, limiting participation to trained professionals. To address this, the project introduces "Textual Architectures," demonstrating how text can be employed to envision and design architectural spaces, thereby making the design process more accessible to individuals without formal training.
The project began with an ambiguous text corpus, deconstructed into keywords related to various architectural elements, and reassembled in structured formats. Generative AI tools were then used to produce multiple text versions, each emphasizing different emotional tones or keywords. These text versions were input into MidJourney to generate corresponding images, which were evaluated by both human evaluators and the CLIP model to determine alignment with the original and recomposed texts. The analysis aimed to identify which text version most accurately produced architectural reference images matching the intended imagination.
In this project, she proposes a comprehensive process for transforming textual content into AI-generated visual representations, ensuring alignment with the original text's context and sentiment. The process begins by decomposing the text into various elements such as size, material properties, spatial relations, and subjects. Sentiment analysis is then applied to understand the emotional tone. The text is recomposed in several stages, including keyword-based restructuring and integration with platforms like Midjourney to generate images. A generative AI rewrite further refines the text by incorporating both sentiment and identified keywords. The final images are evaluated by human subjects, who score them based on their alignment with the original text, and by the CLIP model, which quantitatively assesses the accuracy of the image-text correlation. This method ensures that the resulting images are not only visually accurate but also contextually and emotionally faithful to the source material.
The keyword-based decomposition approach yielded the highest human satisfaction, while the combination of GenAI with sentiment and keywords resulted in the best AI alignment as measured by the CLIP model. This comparison highlights the strengths and limitations of each method, contributing valuable insights into how to optimize text-to-image generation for both human and AI assessments.
The methodology is a systematic approach to converting textual content into AI-generated visual representations. It begins with the decomposition of the original text into various elements such as size, material, and spatial relations, along with sentiment analysis using NLTK to understand the emotional tone. The text is then recomposed, incorporating these elements into a structured format that guides the creation of visuals. This is further refined through Generative AI, which rewrites the text by integrating sentiment and keywords to ensure the visual output is contextually and emotionally aligned with the source material. Finally, the generated images are evaluated by human subjects for their similarity to imagined visuals and by the CLIP model to assess their alignment with the text, ensuring both visual and contextual accuracy.
In an experiment, both images demonstrate the effectiveness of different recompositing methods in creating AI-generated visuals that resonate with both human imagination and AI evaluation. The GenAI rewrite that includes sentiment and keywords slightly outperforms the simpler keyword-based decomposition in terms of human evaluation, suggesting that incorporating sentiment may enhance the emotional alignment of the image with the text.
Conclusion
Overall, the project represents a comprehensive integration of AI into various aspects of design and conservation, with a strong focus on sustainability, historical preservation, and the enhancement of immersive digital experiences. By leveraging AI, the project aims to improve efficiency, accuracy, and sustainability in architectural and restoration projects, while also expanding the capabilities of VR/AR technologies.