Museum Semantic Search
https://museum-semantic-search.vercel.app
https://github.com/derekphilipau/museum-semantic-search
Proof-of-concept for searching museum collections using AI embeddings and AI-generated visual descriptions. Cross-modal search capabilities via SigLIP 2 and text search via Jina v3. Deployed on Vercel and Modal for GPU inference.
This project is meant as a starting point for experimentation and discussion around the use of AI in museum collections search, and should not be taken as advocating a specific approach.
Search Comparison
Despite sometimes questionable results, text embedding search with AI-Generated visual descriptions seems to work well in practice. Image embedding search also shows promise, although it seems less reliable and often produces strange results.
Below are comparisons of keyword search, text embedding search, and image embedding search for the query "woman looking into mirror".
Search for "woman looking into a mirror"
Out of a result set of 20:
The conventional Elasticsearch keyword search over Met Museum metadata produces only 3 results that I consider highly relevant.
Text embedding search using Jina v3 embeddings on combined metadata and AI-generated descriptions returns 13 excellent results, including a number of images where the reflection or mirror is not even visible.
Image embedding search using SigLIP 2 cross-modal embeddings returns 8 highly-relevant results, including artworks where there's no actual mirror, but rather the concept of mirroring, for example "Portrait of a Woman with a Man at a Casement" by Fra Filippo Lippi and "Dancers, Pink and Green" by Edgar Degas.
Results that I found exciting are highlighted in the image below. A number of these I probably would have missed if browsing through images.
Text embeddings search result for "woman looking into mirror".
Difficult to see: the woman on the left is looking into a mirror.
Portrait of a Woman with a Man at a Casement by Fra Filippo Lippi
Image embeddings search result for "woman looking into mirror".
Perhaps the woman is not looking into a mirror, but it does feel like a mirroring.
Mining Archetypes
Intrigued by the results for "woman looking into mirror", I started to search for other art history archetypes to explore across cultures & time. Although the embedding searches are often inaccurate, interspersed are some surprisingly relevant results that would not have been possible with keyword search alone.
Curated Search Results for Various Archetypes
Searches:
AI-Generated Emojis
Sometimes strangely accurate revealing details I missed, at other times questionable and problematic, and often hilarious. Dubious practical use but fun.
Related
Musefully (website, github): Search across museums using Elasticsearch and Next.js
“Accessible Art Tags” GPT: a specialized GPT that generates alt text and long descriptions following Cooper Hewitt Guidelines for Image Description.
OpenAI CLIP Embedding Similarity: Examples of OpenAI CLIP Embeddings artwork similarity search.
Related Projects
MuseRAG++: A Deep Retrieval-Augmented Generation Framework for Semantic Interaction and Multi-Modal Reasoning in Virtual Museums: RAG-powered museum chatbot
National Museum of Norway Semantic Collection Search (Website, Article): Search via embeddings of GPT-4 Vision image descriptions.
Semantic Art Search (Github, Website): Explore art through meaning-driven search
Sketchy Collections (Github, Website): CLIP-based image search tool that lets you explore artworks by drawing or uploading a picture