Generative Adversarial Networks commonly known as GANs are computational architectures that set two neural networks against each other (thus the name “adversarial”), to create new examples of data that fulfill the characteristics of real data. While GANs have a lot of applications, they are typically used in image, video, or speech generation.
In 2014, Ian Goodfellow, Yoshua Bengio and several academics at the University of Montreal, published a paper that introduced GANs. Referring to the impressive capabilities of GANs, Yann LeCun, research director at Facebook AI dubbed adversarial training “the most fascinating notion in machine learning in the last 10 years”.
So, How GANs Work
To develop and modify new data, Ian Goodfellow’s GANs suggest using two neural networks. One is the generator network responsible for creating new instances of data. To put it simply, the process is the inverse of the classification method of neural networks. Unlike the other neural networks, the generator doesn’t take the raw data and map it to predetermined model outputs. Instead it goes backwards from the output in an attempt to construct the input data that would then be mapped to that output. A generator network in GANs, for example, may start off with a matrix of pixels (consisting of random noise) and try to change them in such a manner that a classifier would categories them as a cat.
The second neural network in GANs is called the discriminator. On a scale of 0 to 1, it assesses the quality rating of the generator’s output. If the quality score is less than the desired level, the data is corrected by the generator and sent back to the discriminator. The GAN continues the loop in super-fast fashion until it reaches a point where it can generate data that accurately maps to the intended output.
Overview of GANs
Generating Realistic Images with GANs
In the original paper of GANs titled “Generative Adversarial Networks” Ian Goodfellow, et al. used three datasets (namely MNIST, CIFAR-10, and Toronto Face Database) to generate new images. So, generating new sample examples was the main application of GANs described in the paper.
Overview of GANs https://arxiv.org/abs/1406.2661
Likewise, in 2015 Alec Radford, et al. introduced Deep Convolutional Generative Adversarial Networks in his paper “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”. He also generated new realistic images of bedrooms and presented his analysis on training stable GANs.
New Examples of Bedroom Images Generated by DCGANs: https://arxiv.org/abs/1511.06434
Let’s Code a DCGAN from Scratch
Let’s not wait any longer and go into the implementation specifics of GANs as we go. Here we’ll demonstrate how we can use Deep Convolutional Generative Adversarial Network implementation (DCGAN) to generate realistic images. In our implementation, Tensorflow is used, and some of the principles stated in the DCGAN paper are followed.
We’ll train our DCGAN on Fashion-MNIST dataset comprising of 70,000 28×28 grayscale images with 60,000 images for training and 10,000 images for testing. Plus, it includes 10 classes of products.
So, before we start, let’s import some necessary packages we’ll use in this implementation.
import tensorflow as tf import tensorflow.keras as keras import matplotlib.pyplot as plt import numpy as np from IPython import display
We’ll use some helper functions to visualize the results in the later sessions.
def visualize_images(sample_images, columns=None): '''displaying fake examples''' display.clear_output(wait=False) columns = columns or len(sample_images) rows = (len(sample_images) - 1) // columns + 1 if sample_images.shape[-1] == 1: sample_images = np.squeeze(sample_images, axis=-1) plt.figure(figsize=(columns, rows)) for index, sample_image in enumerate(sample_images): plt.subplot(rows, columns, index + 1) plt.imshow(sample_image, cmap="binary") plt.axis("off")
After downloading the Fashion-MNIST dataset, training images are converted into batches. Then we’ll apply some basic preprocessing steps.
# downloading the Fashin-MNIST dataset from keras.datasets import fashion_mnist (X_train, _), _ = fashion_mnist.load_data() # normalizing the images X_train = X_train.astype(np.float32) / 255 # reshape and rescale X_train = X_train.reshape(-1, 28, 28, 1) * 2. - 1. #defining the batch size batch_size = 64 # creating batches of tensors before feeding them into the model data_set = tf.data.Dataset.from_tensor_slices(X_train) data_set = data_set.shuffle(1000) data_set = data_set.batch(batch_size, drop_remainder=True).prefetch(1)
Building the Generator Neural Network
The generator network takes random noise samples as inputs. They are eventually transformed into the shape of Fashion-MNIST images. Following the steps below, let’s build a generator network for a DCGAN.
- We’ll feed the noise samples as input to the dense layer of the network.
- The output of the network will be reshaped in order to form three dimensions. This goes for length, width, and number of filters.
- We’ll perform the deconvolution operation with Conv2DTranspose and set the stride at 2.
- The number of filers shall be reduced by half at each level of the network.
- We perform upsampling operation in order to match the feature size of the training images. In this case, it should be 28 x 28 x 1.
- Batch Normalization layer is added after every convolution layer except for the final deconvolution layer.
- As best practice, we use SELU (Scaled Exponential Linear Unit) activation function for intermediate deconvolution operations.
- Similarly for the output layer, tanh is used.
Let’s code everything described above in a block and print out the model summary in order to check the dimensions and shapes at each layer.
from keras.models import Sequential from keras.layers import Dense, Reshape, BatchNormalization, Conv2DTranspose, Conv2D, Dropout, Flatten codings_size = 32 def generator_nn(codings_size): generator = Sequential([Dense(7 * 7 * 128, input_shape=[codings_size]), Reshape([7, 7, 128]), BatchNormalization(), Conv2DTranspose(64, kernel_size=5, strides=2, padding="SAME", activation="selu"), BatchNormalization(), Conv2DTranspose(1, kernel_size=5, strides=2, padding="SAME", activation="tanh"), ]) return generator # get the generator network and print out the summary gen = generator_nn(codings_size) gen.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 6272) 206976 reshape (Reshape) (None, 7, 7, 128) 0 batch_normalization (BatchN (None, 7, 7, 128) 512 ormalization) conv2d_transpose (Conv2DTra (None, 14, 14, 64) 204864 nspose) batch_normalization_1 (Batc (None, 14, 14, 64) 256 hNormalization) conv2d_transpose_1 (Conv2DT (None, 28, 28, 1) 1601 ranspose) ================================================================= Total params: 414,209 Trainable params: 413,825 Non-trainable params: 384 _________________________________________________________________
We’ll use some helper functions to visualize the results in the later sessions.
def visualize_images(sample_images, columns=None): '''displaying fake examples''' display.clear_output(wait=False) columns = columns or len(sample_images) rows = (len(sample_images) - 1) // columns + 1 if sample_images.shape[-1] == 1: sample_images = np.squeeze(sample_images, axis=-1) plt.figure(figsize=(columns, rows)) for index, sample_image in enumerate(sample_images): plt.subplot(rows, columns, index + 1) plt.imshow(sample_image, cmap="binary") plt.axis("off")
After downloading the Fashion-MNIST dataset, training images are converted into batches. Then we’ll apply some basic preprocessing steps.
# downloading the Fashin-MNIST dataset from keras.datasets import fashion_mnist (X_train, _), _ = fashion_mnist.load_data() # normalizing the images X_train = X_train.astype(np.float32) / 255 # reshape and rescale X_train = X_train.reshape(-1, 28, 28, 1) * 2. - 1. #defining the batch size batch_size = 64 # creating batches of tensors before feeding them into the model data_set = tf.data.Dataset.from_tensor_slices(X_train) data_set = data_set.shuffle(1000) data_set = data_set.batch(batch_size, drop_remainder=True).prefetch(1)
Building the Discriminator Neural Network
Similarly, we’ll build a discriminator network using the following steps:
- Use the convolutional layers by setting the strides at 2 in order to reduce the size or dimensions of the input images.
- Use “LeakyRELU” as an activation function after every convolution operation.
- Flatten the output features and feed them to a dense layer with single neuron activated by “sigmoid” activation function.
The implementation details along with the model summary are as under.
# building the discriminator network def discriminator_nn(): discriminator = Sequential([ Conv2D(64, kernel_size=5, strides=2, padding="SAME", activation=keras.layers.LeakyReLU(0.2), input_shape=[28, 28, 1]), Dropout(0.3), Conv2D(128, kernel_size=5, strides=2, padding="SAME", activation=keras.layers.LeakyReLU(0.2)), Dropout(0.4), Flatten(), Dense(1, activation="sigmoid") ]) return discriminator discrim = discriminator_nn() discrim.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 6272) 206976 reshape (Reshape) (None, 7, 7, 128) 0 batch_normalization (BatchN (None, 7, 7, 128) 512 ormalization) conv2d_transpose (Conv2DTra (None, 14, 14, 64) 204864 nspose) batch_normalization_1 (Batc (None, 14, 14, 64) 256 hNormalization) conv2d_transpose_1 (Conv2DT (None, 28, 28, 1) 1601 ranspose) ================================================================= Total params: 414,209 Trainable params: 413,825 Non-trainable params: 384 _________________________________________________________________
Now, let’s combine these two neural networks (i.e. generator and discriminator) for a complete architecture of a DCGAN. Here is how we’ll do that.
#combining the generator and discriminator dcgan = Sequential([gen, discrim])
As the model distinguishes between fake (0) and real (1), we’ll use “binary_crossentropy” in order to measure the loss between the images. Moreover, we’ll optimize the model loss with RMS_prop optimizer. Let’s now compile the model for training.
# compiling the model discrim.compile(loss="binary_crossentropy", optimizer="rmsprop") discrim.trainable = False dcgan.compile(loss="binary_crossentropy", optimizer="rmsprop")
Let’s define a function that trains our GAN on a given batch of images. The process involves two-phase training steps as under:
- First off, we’ll train our discriminator network in order to distinguish between real and fake images.
- Next, the generator network will be trained to produce fake images that should trick the discriminator to map them as real ones.
def train_dcgan(dcgan, data_set, random_normal_dimensions, epochs=20): gen, discrim = dcgan.layers for epoch in range(epochs): print("Epoch {}/{}".format(epoch + 1, epochs)) for real_image_samples in data_set: # infer batch size from the training batch batch_size = real_image_samples.shape[0] # Training the discriminator network - first phase # creating random noise samples noise_samples = tf.random.normal(shape=[batch_size, random_normal_dimensions]) # generating fake image samples fake_image_samples = gen(noise_samples) # listing fake and real images by concatinating them conc_images = tf.concat([fake_image_samples, real_image_samples], axis=0) # Create the labels for the discriminator # 0 for the fake images # 1 for the real images discrim_labels = tf.constant([[0.]] * batch_size + [[1.]] * batch_size) # set the discriminator to trainable discrim.trainable = True # training the discriminator on the batches of conc_images and the discrim_labels discrim.train_on_batch(conc_images, discrim_labels) # Training the generator netwrok - 2nd phase # feeding noise input samples into DCGAN noise_samples = tf.random.normal(shape=[batch_size, random_normal_dimensions]) # labelling generated images as real images gen_labels = tf.constant([[1.]] * batch_size) # freezing the discriminator network discrim.trainable = False # training the DCGAN with labels set to true
dcgan.train_on_batch(noise_samples, gen_labels) # plotting fake image samples visualize_images(fake_image_samples, 16) plt.show()
Finally, it’s time to train our DCGAN and see the magic. We’ll call the above function to start the training. While we set the number of epochs to 20, you are free to try the higher number in order to achieve more accuracy. It should take around 30 seconds to execute single epoch in a colab environment. You will visualize the results (fake images generated by our DCGAN) after every epoch during the training process.
So, let’s start off the training by simply executing the below lines of code.
#trining the DCGAN for 20 epochs train_dcgan(dcgan, data_set, codings_size, 20)
After the training, your final output should look like this.
In machine learning community, GANs have been one of the hottest topics due to their impressive capabilities. These models can take the unsupervised learning methods to new heights, hence broadening the scope of machine learning.
Since its inception, many strategies for training GANs have been developed by the researchers. They have been able to introduce some new and state-of-the-art training strategies that can be employed in image/video generation and semi-supervised learning.