Compressing Art to Text (and Back)

For the Museum Semantic Search prototype I used a multimodal LLM to generate visual descriptions of artwork images to improve semantic search with text embeddings. These generated descriptions can be treated as an extreme form of non-reversible lossy compression, turning a digital image into a compact text.

I wanted to see how far this image‑to‑text compression could go. How small can a faithful description be? And can that description be hydrated back into an image that still resembles the original?

Using a prompt tuned for literal, high‑fidelity detail, I gave an artwork image to a multimodal LLM. I then fed the resulting description into an image generator to hydrate the “compressed” image. This image > text > image cycle can be repeated, iteratively compressing and re-generating.

This is a practical test of both components: how well the multimodal LLM captures what matters visually, and how well the image generator renders that description. This is also a test of the prompt itself, and one could imagine a prompt refinement process that uses the image > text > image technique.

In these tests, I used Google Gemini 2.5 Pro with the prompt (below) and an image. I then fed its description to Nano Banana.

Early rounds looked suprisingly close to the originals. With more iterations, drift emerged, with images becoming progressively stylized, exposing the limits of the descriptions and the models’ priors and biases.

One note: The models know about many famous artworks and can recognize them from the text descriptions. The result using Jacques Louis David’s The Death of Socrates was suspiciously good. Next time I should use lesser-known or unknown works.

Extreme Compression: Artwork to Caption to Image

We’d expect a short caption to produce a less accurate image than a detailed, full description:

Comparing images generated from a short caption vs. a full description

Artwork to Full Description to Image

Ia Orana Maria (Hail Mary), Paul Gauguin French, 1891

The Death of Socrates, Jacques Louis David French, 1787. I’m suspicious of these results- Gemini immediately knew what painting it was based on the text description.

The Fitting Room, Kenneth Hayes Miller American, 1931

The Harbor, Josef Presser American, ca. 1943

Breton Fishermen and Their Families, Théodule-Augustin Ribot French, possibly ca. 1880–85

Scholar viewing a waterfall, Ma Yuan Chinese, early 13th century

Full Description Prompt

This prompt was used with an image using Google Gemini 2.5 Pro:

You are a meticulous visual describer. Produce a reproduction‑grade natural‑language description of the attached artwork. Write clearly and literally in the present tense. Your goal is to enable a renderer to recreate what is visible without guessing.

What to cover, in prose:

  • Overview: aspect ratio and orientation; what the image shows at a glance; obvious cropping.

  • Composition & geometry: horizon (if any); perspective (one/two/three‑point or flat); major axes and shapes; balance and symmetry; approximate placements using fractions of the frame (e.g., “upper third,” “~20% from the left”); notable angles (e.g., “tilts ~15°”).

  • Subjects & objects (back‑to‑front): categories, counts, poses/states, sizes relative to the frame, occlusions/truncations, salient attributes.

  • Color & palette: dominant hues with approximate proportions; name colors plainly and include a close hex when confident (e.g., “deep ultramarine #1f3a93 ~25%”); note accents and overall key/contrast.

  • Lighting: environment (daylight/overcast/spotlight/etc.), direction of key light, shadow hardness, approximate warmth/coolness, presence of specular highlights.

  • Materials & texture: surface qualities that change rendering (matte vs gloss, brushwork density, wood grain, fabric weave, patina, craquelure, film grain).

  • Background & environment: structure and depth cues; pattern/texture; any atmospheric effects.

  • Text in the image: transcribe exactly (case, punctuation, line breaks) and give approximate placement.

  • Negative space & notable absences: call out items whose absence matters (“no clouds,” “no vehicles,” “no jewelry”).

  • Uncertainties: when something is ambiguous, say so and explain why (“appears to…”, “uncertain due to shadow/occlusion”).

Rules

  • Be literal. No metaphors, symbolism, authorship, era, or demographic inferences.

  • Prefer numbers to adjectives: counts; proportions (“occupies ~0.35 of the width”); distances (“~5% from top edge”).

  • If a value can’t be precise, bound it (“~35–45%,” “likely two‑point perspective”).

  • Keep it natural language with short, crisp sentences. Headings are fine; bullet lists only if they help clarity.

Return only the description.

Example Full Description

Example initial description of Ia Orana Maria (Hail Mary) generated by Gemini 2.5 Pro:

Overview

The image is a vertically-oriented, full-frame painting with an aspect ratio of approximately 4:3. It depicts five figures in a lush, tropical landscape with a still life of fruit in the foreground. The style is non-photorealistic with flattened perspective and bold outlines.

Composition & Geometry

The composition is vertically layered into three main horizontal bands. The bottom quarter is the foreground, containing a bowl of fruit. The middle section contains four standing figures. The upper section contains foliage, a fifth figure, and a glimpse of sky and a rooftop. The horizon line is high and obscured by foliage. The perspective is shallow, compressing the sense of depth. A dominant vertical axis is established on the right by a woman carrying a child, who occupies roughly 40% of the canvas's width. This is balanced by two other figures and foliage on the left.

Subjects & Objects (Back-to-Front)

  • Background: In the top 10% of the frame, a sliver of sky is visible, rendered in patches of pale blue and light ochre. Below this, in the upper left, is the corner of a simple, dark brown gabled roof. The majority of the background and mid-ground is filled with dense foliage. This includes tall, slender, greyish tree trunks and a variety of leaves rendered in shades of green, yellow-green, and dark blue-green. In the upper middle section, a tree bears small, white blossoms and round, light green fruit.

  • Figures: There are five human-like figures.

    • Winged Figure: Located in the upper left quadrant, partially obscured by a flowering bush. This figure has large wings, with yellow feathers tipped in dark blue, and appears to be wearing a light-colored garment. It faces right, towards the other figures, with its hands raised and palms together. Its face is indistinct.

    • Two Central Women: Standing in the middle-left of the canvas, just behind a small, leafy tree. Both are bare-chested and wear patterned wraps around their waists. The woman further back and to the left has her dark hair pulled back and looks right. Her wrap is a dark navy blue (#1a234f) with a repeating pattern of yellow, stylized flowers or suns. The woman slightly in front of her has dark, center-parted hair and also looks right. Her wrap is a pale off-white with indistinct grey markings. Both women hold their hands clasped together at chest level.

    • Woman on Right: A woman stands in the right half of the frame, facing forward but looking directly at the viewer. She has dark hair, dark eyes, and medium-brown skin. She wears a bright red (#c73c33) sleeveless dress or wrap decorated with a pattern of white, stylized flowers. Her bare feet are visible at the bottom of her dress. Her hands support a child sitting on her shoulders.

    • Child: A small child with light brown skin sits on the right-hand woman's shoulders. The child is naked and is seen from the waist up. It has short dark hair and a thin, faint yellow circle, like a halo, painted around its head. The child's right arm rests on the woman's head, while its left is bent upwards. The child gazes towards the left side of the painting.

  • Foreground: The entire bottom quarter of the image is dominated by a collection of fruit, mostly held within a large, low-rimmed, dark brown bowl. On the left side of the bowl is a large bunch of bananas in shades of green, yellow, and reddish-orange. On the right side of the bowl is a pile of pale, milky-green, oblong fruits. Behind these, a single round, dark green melon-like fruit is visible. The ground beneath and around the bowl is a textured mix of dark green, ochre, and brown. A small, leafy plant with white blossoms rises from the ground on the left, in front of the two central women.

Color & Palette

The palette is rich and saturated. Dominant hues are the bright red (~20%) of the main figure's dress and the varied greens and earthy browns of the foliage and ground (~50%). The skin tones are rendered in warm browns. Strong accent colors include the deep blue and yellow of the wrap on the left and the pale off-white of the wrap in the center. The overall color key is medium to dark, with high contrast between the bright red dress and the surrounding darker tones.

Lighting

The lighting is diffuse and even, as if on an overcast day or within a jungle canopy, with no single, strong light source. There are no hard-edged cast shadows. The figures and foreground elements are lit frontally, giving them solidity without strong highlights or deep shadows. The overall temperature of the light feels neutral to slightly warm.

Materials & Texture

The artwork has a matte finish. Brushwork is visible throughout, with colors often applied in distinct, flat areas bordered by dark outlines, particularly on the figures and their clothing. There is little to no impasto. The texture of the foliage is created through layered, patterned strokes. The surfaces of the fruit appear smooth, and the fabrics of the wraps are depicted as simple, opaque cloth.

Background & Environment

The setting is a dense, outdoor tropical environment. Depth is suggested through the layering of elements—foreground fruit, mid-ground figures, and background foliage—rather than through linear perspective. The entire scene is filled with pattern, from the leaves on the trees to the designs on the clothing.

Text in the image

In the bottom left corner, a horizontal yellow-ochre rectangle serves as a plaque or label within the painting. On it, the words "IA ORANA MARIA" are painted in a dark, reddish-brown, sans-serif, uppercase font.

Negative Space & Notable Absences

The composition is dense, with very little negative space. The figures do not wear any jewelry, shoes, or accessories other than their clothing wraps and the child's halo. There are no buildings visible besides the partial roof, nor are there any animals or defined pathways.

Uncertainties

The precise species of the flowering plants and fruits are not clearly identifiable. The features of the winged figure in the upper left are indistinct and partially occluded by foliage. The exact material of the foreground bowl is unclear; it could be wood or low-fired pottery.

Next
Next

Museum Semantic Search