I recently learned one art technique for drawing heads. This technique is called Loomis Method. It is taught by Andrew Loomis in his book, Drawing the Head & Hands.
Anyway, a picture is worth a thousand words. Here is one of my drawings with the Loomis method (Elif):
Apart from learning this art technique, I wanted to develop a machine learning model to transform a photo into a Loomis head drawing (and the other way around).
For this transformation, I tried two GAN models: pix2pix and cyclegan.
pix2pix is mapping an image in one domain to another. For example, it can colorize a black and white image. In my case, it will take a portrait photo and return a drawing with the Loomis method.
How does it work? I can simply explain as follows:
The architecture has two main components: the generator and the discriminator.
The generator generates an image. It is just random pixels in its first try.
The discriminator, on the other hand, is just predicting whether an image pair is produced by the generator or not. In other words, it is doing a binary classification.
Over time, the generator produces better images, and the discriminator is getting better in distinguishing what is real and what is produced. The whole architecture is designed in a way that the generated images are indistinguishable by the discriminator. This is only possible when the generator produces good images.
To learn more about pix2pix, I would recommend you to watch this video by Two Minutes paper.
pix2pix requires a paired dataset. I created the following pairs. I have nearly 100 pairs. It is still a limited dataset, but I wanted to give it a try.
I trained the model a few hours on the AWS EC2 instance with a good GPU.
The results were not great, but OK for now. I didn't (couldn't) expect more than these.
As you can see in the below illustration, in most of the outputs, the angle of the head doesn't look correct. I assume it is because my training dataset doesn't have enough examples to cover all the possible combinations.
A few outputs:
It is a pain to get such pairs. Luckily, there is another framework for doing image to image translation, and it doesn't require paired examples. It is CycleGAN.
This time I need a bunch of input images (portraits), and a bunch of target images (drawings). No need to have pairs.
There is a constraint in this architecture: Generated images are translated back to the original image so that there will be a meaningful relationship between the input images and the generated images. A good explanation is here.
I picked 600 samples out of 100K StyleGAN generated photos. In other words, they are photos of people who don't exist. This dataset doesn't have sample portrait photos with extreme angles. For simplicity, I picked the ones with natural lighting, soft background, not smiling.
It couldn't translate photo to drawing well. The output is like edge detection. However, drawing to photo translation was much better than I expected. In the paper, and found this statement:
On translation tasks that involve color and texture changes, like many of those reported above, the method often succeeds. We have also explored tasks that require geometric changes, with little success.
One sample for drawing to photo translation:
See the below video to see how it is translating.
Finally, I also wanted to see how it performs with a training set with less variance. So, I trained it with photos with studio lighting, white background.
This task, extracting artistic form from a photo, is requiring a geometric transformation.
pix2pix seems to work; however, it requires paired examples, and it is difficult to find in most cases. I thought it could work better if I had 3D models in both domains so that I could render images in a way that they will be pairs. However, I didn't spend time on that.
CycleGAN works well when there are no pairs. However, it can't do a geometric transformation. It is good if the task is about changing texture or color.
I found both pix2pix and CycleGAN amazing. They would bring much better results if I have had more training examples and if I have trained them longer.
Let me know if you detect a mistake. I will be developing this content.