StyleGAN 1

The history of ceramics is one of imitation and reproduction.

The apprentice obtains mastery of the craft through repetition, gradually improving their technique. Guided by a lifetime of working in the craft, the master examines each piece made by the student and throws away those deemed unsuitable.

The forger creates replicas and tests them in the marketplace. The connoisseur, informed by decades of experience dealing with antiques, judges the replicas. Those that are mistaken as authentic are sold, and the forger goes on to create even more convincing copies.

The "fake" vessels on this website have been created through a similar process of repetition, examination, and reinforcement. Except in this case, the entire procedure has taken place within machine-learning (ML) software known as a Generative Adversarial Network (GAN).

GANs consist of two parts: the Generator and the Discriminator. In a very general sense, the role of the Generator is similar to that of the apprentice and the forger, while the Discriminator plays the role of the Master or connoisseur. In a continuous feedback-loop, the Generator creates "fakes" that will be judged by the Discriminator as being "real" or "fake", and both parts improve as time goes on. Eventually the Generator becomes a "Master" and can create the images on this website.

As extremely powerful ML software like StyleGAN are released and become more user-friendly, artists will have new tools with which to understand their craft and create new work.

Note: I am by no means a machine-learning expert. For those of you who are actually experts in the fields of AI and ML, I apologize in advance for poor generalizations and oversimplifications, and I hope that you will notify me of any mistakes. -Derek

From a book of forms. Jingdezhen, China, 2008.

Imitation qingbai ware. Jingdezhen, China, 2008.

Machine Learning & GANs

Computerphile has a high-level overview of Generative Adversarial Networks (GANs) here.

Perhaps the easiest way to visualize how StyleGAN works is to watch the original video: "A Style-Based Generator Architecture for Generative Adversarial Networks".

Gwern's excellent Making Anime Faces With StyleGAN introduces the original research paper, "A Style-Based Generator Architecture for Generative Adversarial Networks", Karras et al 2018 (paper, video, source), and explains in detail the procedure to install and run the StyleGAN software.

Beginning in February of 2019 with Phillip Wang's This Person Does Not Exist, a number of websites sprouted up to showcase the power of StyleGAN trained on various image datasets: cats, anime characters, Airbnb rooms, etc.

Creating a dataset

StyleGAN requires relatively large datasets of images. Datasets are usually comprised of images of the same "thing"- human faces, cars, bedrooms, cats, anime characters, etc. (The original Stylegan paper used a dataset of 70,000 high-quality images of human faces.)

I focused on a single form, the "vase", in order to keep the dataset relatively simple. Including all types of "vessels"- cups, bowls, dishes, etc.- would have resutled in far too much variation, especially if I wanted to keep the dataset less than a few tens of thousands of images in size. Vases also have an advantage in that they are usually photographed from the same angle (from the front and slightly elevated).

Having said that, there is a huge amount of variation even within vases. I could have limited the dataset even further by including only ceramic vases, however I'm very interested in seeing the cross-pollination between vases of different materials- porcelain, glass, wood, metal, etc. (For an excellent example of the influence of various craft traditions upon one another, see the Masterpieces of Chinese Precious Metalwork, Early Gold and Silver; Early Chinese White, Green and Black Wares auction from Sotheby's.)

I was worried of having too small of a dataset and the possibility that the StyleGAN software might just end up memorizing the whole thing. So I ended up scraping a variety of websites until I had around 50k images. I bypassed Google Images for a number of reasons:

  • images of "vases" are too varied, many are filled with flowers or have complicated backgrounds,

  • google-images-download (the only reliable downloader I could find) only seems to be able to download 600 images per query, and breaks down after the first 100 when doing more complicated domain-based searches,

  • I couldn't guarantee I wasn't just downloading a lot of duplicate images on each variation of my search parameters.

Using Flickr as a source had the same issues and Google Images. So instead I focused on museums and auction houses, where I could download entire image sets for "vases" and be assured of high-quality images shot against a simple backgrounds. Because each site is quite different, I resorted to a variety of scraping tools, from home-grown shell, Python and PHP scripts to more powerful tools like Scrapy. The output of all of my scripts is simply dumping image URL's to text files. Then, a set of shell scripts iterates through each URL:

  1. Download the image file with wget to a local file with a unique filename.

  2. Use ImageMagick convert to resize the image to exact dimensions of 1024 x 1024, fill in unused space with white canvas, adjust image DPI, colorspace, and quality (can be lower than 90 to save space).
    convert "./$img_filename" -resize 1024x1024 -gravity center -extent 1024x1024 -background white -density 72 -set colorspace sRGB -quality 90 "./$img_filename"

  3. Store the original source URL as an IPTC metadata field in the image file itself using exiftool.

These sets of images were then manually reviewed, and I tried to clean up the data as best as possible. About 20% of the images were removed for being unrelated (shards, paintings of vases, etc.), poor quality, bad angle, etc. I used dhash to quickly eliminate duplicate images.

The "Originals" dataset of photos come from a variety of museum and auction house websites including: Adrian Sassoon, Artcurial, Art Institute of Chicago, Artsy, Bonhams, The British Museum, Bukowskis, China Guardian, Christies, Corning Museum of Glass, Dallas Museum of Art, Dorotheum, Doyle, Freeman's, Freer | Sackler, Harvard Art Museums, The State Hermitage Museum, I.M. Chait, Lyon and Turnbull, Maak London, MAK Vienna, Boston Museum of Fine Arts, The Metropolitan Museum of Art, Minneapolis Institute of Art, Philadelphia Museum of Art, Phillips, Poly Auction, Rijksmuseum, The Smithsonian, Sotheby's, Victoria and Albert Museum, Woolley & Wallis, and Wright.

The final, edited dataset is approximately 38,000 high-quality images. However, due to the amount of variation in vases, I think it would be better to use a larger dataset of perhaps 100k images. Unfortunately I couldn't think of any more museums and auction houses with large collections. If you are aware of other sources for high-quality images of vases (or even other vessels), please contact me.

Running StyleGAN

I ended up running StylGAN multiple times- first at 512x512px just to test the system, then at 1024x1024px. As noted in gwern's guide, perhaps the most important and time-consuming part of the process is obtaining a large, high-quality, clean dataset. Due to various issues and overlooked complications with the data, I ended up having to completely re-run the 1024px model after manually combing through the images and removing as much junk as I could.

I initially ran StyleGAN on an 8 vCPU, 30GB RAM Jupyter Notebook (CUDA 10.0) instance with a single NVIDIA Tesla P100 hosted on Google Cloud's AI Platform. Once the resolution reached 1024x1024 and iterations started taking more time (approximately 2 hours between ticks), I stopped the VM and reconfigured it to use dual NVIDIA Tesla P100 GPU's. This configuration costs more but effectively halves the amount of time needed.

Before following the StyleGAN guide at Making Anime Faces With StyleGAN, I needed to upgrade Python to version 3.6.x (required for StyleGAN).

As discussed in the post Training at Home, and in the Cloud, training a StyleGAN model from scratch is time-consuming and expensive. Once I reached 9000 kimg, I was reaching my budget limit and still needed enough computation time to generate samples. Also, from 8500-9000 kimg I noticed that progress had drastically slowed, and I was getting the "elephant wrinkles" that gwern describes. Rather than keep going, I hope to acquire a larger, cleaner dataset at a later date and try again.

For those of you who want to try generating samples or transfer learning, the resulting model at 8980 kimg is here: network-snapshot-008980.pkl

I'm not sure how to share the actual collection of originals due to copyright and size issues. The unique .tfrecord format datasets generated from the original images to be used by StyleGAN is over 150G in size.

Generated Samples

The β€œtruncation trick” with 10 random vessels with πœ“ range: 1, 0.8, 0.6, 0.4, 0.2, 0, -0.2, -0.4, -0.6, -0.8, -1. As gwern notes this illustrates "the tradeoff between diversity & quality, and the global average". The "global average" vessel forms the middle column of each image grid.

πœ“ range: 1, 0.8, 0.6, 0.4, 0.2, 0, -0.2, -0.4, -0.6, -0.8, -1

πœ“ range: 1, 0.8, 0.6, 0.4, 0.2, 0, -0.2, -0.4, -0.6, -0.8, -1

Qinghua (Blue & White)

Interested to see the effect of using a more limited dataset to train the model, I created a new collection of images that only included Chinese Blue & White (qinghua) vessels. (It's possible that some Dutch Delftware and Japanese Arita-ware snuck in.) This new set was much smaller, only around 2,800 images. Using transfer learning, I started the StyleGAN software with the original vessels .pkl model (network-snapshot-008980.pkl) and trained against the new, limited dataset of only Blue & White. After just one round of training I was already getting very good results, but I kept the training going for 10 rounds until the next network snapshot .pkl file was written. Then, using the new network snapshot I generated images and videos.

The Blue & White model at 8980 iterations is here: network-snapshot-009180.pkl

πœ“ range: 1, 0.8, 0.6, 0.4, 0.2, 0, -0.2, -0.4, -0.6, -0.8, -1

πœ“ range: 1, 0.8, 0.6, 0.4, 0.2, 0, -0.2, -0.4, -0.6, -0.8, -1

Diverse

gwern's article describes the πœ“/β€œtruncation trick”, which is an important, adjustable parameter for generating images with StyleGAN. While most of the images on this website have πœ“ set to 0.6 (which gives reasonable if boring results), more diverse and distorted images can be generated with πœ“ at higher numbers. The "Diverse" section of this website showcases images generated with πœ“ set at either 1.0 or 1.2. Although there are more artifacts and unrealistic-looking results, many of the images are more interesting for their artistic possibilities and unusual combination of influences. At times these more diverse images achieve a nostalgic, dreamlike, and painterly quality that I find very interesting.

Some favorite results from the diverse set.

Some favorite results from the diverse qinghua set.

Previous
Previous

This Glaze Does Not Exist.