“Technology advanced enough is indistinguishable from magic.”
--Arthur C. Clarke (author of 2001: A Space Odyssey)
A 17-chapter series to create images, text, music, figures, and patterns in PyTorch. The series show how to:
- Create a ChatGPT-style large language model from scratch to generate text that can pass as human-written
- Generate images that are indistinguishable from real photos
- Compose music that anyone would think it’s real
- Create patterns such as a sequence of odd numbers, multiples of five, ...
- Generate data that mimic certain shapes: sine curve, cosine shape, hyperbola graph
- Control the latent space to generate images with certain attributes: men with glasses, women with glasses, transitioning gradually from men with glasses to men without glasses, or from women without glases to women with glasses...
- Style transfer: convert a horse image to a zebra...
Most of the generative models in this book belong to a framework called Generative Adversarial Networks (GANs). This chapter introduces you to the basic idea behind GANs and you'll learn to use the framework to generate data samples that form an inverted-U shape. At the end of this chapter, you'll be able to generate data to mimic any shape: sine, cosine, quadratic, and so on.
You'll learn how to use GAN to generate a sequence of numbers with certain patterns. We'll try to generate multiples of five. But you can change the pattern to multiples of two, three, seven, or any number really. This is the output from a trained GAN:
tensor([25, 0, 30, 40, 25, 35, 10, 30, 10, 0], device='cuda:0')
All numbers are multiples of five!
Generate image without using convolutional layers:
Use deep convolutional GAN to generate color images:
and control attributes: here you can transition from red-hair to black-hair:
Use Wasserstein distance to stabilize training, plus add label to generate certain types of images. E.g., faces without glasses over the course of training: https://gattonweb.uky.edu/faculty/lium/ml/noglasses.gif"
Convert horses to zebras:
Train a variational autoencoder (VAE) to generate color images of human faces. Control encodings to generate images with certain attributes: e.g., images that gradually transition from images with glasses to images without glasses. Take the encodings of men with glasses, minus encodings of men without glasses, and add in the encodings of women without glasses, you'll generate images of women with glasses. The whole experience seems like straight out of science fiction, hence the opening quote by the science fiction writer Arthur Clarke: “Technology advanced enough is indistinguishable from magic.”
To give you an idea what the chapter will accomplish, here is the transition from women with glasses to women without glasses: Transition from women without glasses to men without glasses Two examples of encoding arithmetic:
Below is the text generated by the model with prompt "The city of Lexington in the state of Kentucky":
The city of Lexington in the state of Kentucky, is also offering a $300 award to "Owner" PANZER-KATZ FOR BEST LENGTH OF STREET CARS, or just "For the Best Street Car" in any of their three categories:
4WD (3.5 miles or less)
6WD (3.5 miles or more)
FWD (3.5 miles or more)
And this is for the BEST street car in the 4WD category:
The "Neato" (pronounced "Nice")
What is it with those 4WD cars and their "Neato" names? This is probably one of the most well-know 4WD names in the history of 4WD cars. It is so well known that there are a multitude of books dedicated to the design and specifications of "Nice" 4WD cars, such as this one from Michael B. Smith, which is a good read.
But as of right
Train a generative adversarial network (GAN) to produce music. here is a sample of the generated music: https://gattonweb.uky.edu/faculty/lium/ml/MuseGAN_song.mp3
Train a ChatGPT-style transformer to generate music. here is a sample of the generated music: https://gattonweb.uky.edu/faculty/lium/ml/musicTrans.mp3