TL-GAN: An efficient way to control compositing and editing

Humans can easily describe an image. In machine learning, this task is seen as a discriminative classification/regression problem, that is, predicting feature labels from the input image. Recent advances in machine learning technology, especially deep learning models, have begun to solve such tasks, and some can reach or exceed human levels.

However, it is difficult to generate realistic pictures based on text narratives, and many years of graphic design training are required. In the field of machine learning, this is a generative task, which is more challenging than discriminative tasks, because generative models need to produce richer information based on smaller inputs.

Although there are many difficulties in creating a generative model, it is very useful in the following situations:

Content creation: For example, an advertising company can automatically generate attractive product images, a fashion designer can let the algorithm generate the shoe style he wants, and then select the works that meet the requirements, or a game can allow players to follow a simple description Create realistic characters.

Intelligent editing according to the content: With this model, the cameraman can change the facial expression, wrinkles and hairstyle of the image with just a click of the mouse; the filmmaker can change the cloudy background to sunny.

Data enhancement: Autonomous driving companies can generate and synthesize realistic images of special scenes to enhance training data sets; credit card companies can synthesize special fraud data to enhance the data sets in the fraud detection system.

In this article, I will introduce our recent research results, called Transparent Hidden Space GAN (TL-GAN), which extends the current advanced model and proposes a new way of interaction. Regarding the basic introduction of GAN, Lun Zhijun will not expand it in detail here. We have written many related articles. Interested readers can read: Learning from Zero: An Introductory Guide to Generating Adversarial Networks (GAN).

Control the output of the GAN model

The original form of GAN and several other mainstream forms (such as DC-GAN and pg-GAN) are unsupervised learning models. After training, the generation network will take random noise as input and generate a realistic image similar to a photo. It is basically impossible to distinguish whether it is a sample in the training set or a real image. However, we cannot further control the characteristics of the generated images. In many applications, users want to customize samples according to their own preferences (such as age, hair color, facial expressions, etc.), and it is best to continuously adjust these characteristics.

In order to achieve this control generation effect, many GAN variants have been tried. They can be roughly divided into two categories: style transfer networks and conditional generators.

Style transfer network

The style transfer network represented by CycleGAN and pix2pix is â€‹â€‹a model that transfers images from one domain to another (for example, turning a horse into a zebra, programming a sketch with a colored image). As a result, we cannot adjust a feature between two independent states (such as adding a little mustache to the face). At the same time, a network can only be applied to one type of migration, so if you want to adjust 10 characteristics, you must build 10 different networks.

Condition generator

Conditional generators represented by conditional GAN, AC-GAN, and Stack-GAN are models that can jointly learn images with feature tags during training to generate images with customized features. So when you want to add new adjustable features during the generation process, you need to retrain the entire GAN model, which requires a lot of computing resources and time. In addition, the entire process requires only one data set, which contains all custom feature labels.

Our TL-GAN model controls the generation task from a new perspective and solves the existing problems. It allows users to gradually adjust one or more features with a single network. In addition, users can also add new adjustable features within an hour.

TL-GAN: An efficient way to control compositing and editing

Make hidden space transparent

Nvidia's pg-GAN is a model that can generate high-resolution face images. The 1024Ã—1024 images generated are all features determined by a 512-dimensional noise vector in the hidden space. Therefore, if we can understand what the hidden space represents, that is, make it transparent, we can fully grasp the generation process.

After pre-training with pg-GAN, I found that hidden space has two advantages:

Most points in the space can generate reliable images;

It is very coherent, indicating that the interpolation between two points in the hidden space will cause a smooth transition between the corresponding images.

Knowing these two characteristics, I think that predictable feature directions can be found in the hidden space. If it can be achieved, we can use the unit vectors in these directions as feature axes to control the generation process.

Method: Proposal of characteristic axis

In order to find the feature axis in the hidden space, we will establish a connection between the hidden vector z and the feature label y, which will use the supervised learning method trained on the paired data (z, y). The current problem is how to get paired data. The existing data set only contains images x and their corresponding feature labels y.

Method of connecting hidden vector z and feature label y

Available methods

One of the methods is to calculate the hidden vector corresponding to the image xreal from the existing labeled data set. But GAN network is more difficult to calculate zencode=G^(-1)(x_real), so this method is difficult to implement.

Another method is to generate a synthetic image x_gen from a random hidden vector z. The problem here is that the synthetic image is unlabeled, and we cannot easily use the available label data set.

In order to solve this problem, the key is to train an independent feature extractor-model y=F(x), use the existing labeled image data set (xreal, yreal), and then extract the features from the network F and GAN The generator G is combined. After completion, we can predict the feature label ypred of the synthesized image xgen to establish a connection between z and y.

Now that we have a pair of hidden vectors and features, we can train a regression model y=A(z) to get all the feature axes that control the image generation process.

TL-GAN model architecture

As shown in the figure, there are five steps in the TL-GAN model:

Learning distribution: Choose a trained GAN model and generator network. I chose the trained pg-GAN, which can generate the highest quality face images.

Classification: Choose a pre-trained feature extraction model (you can use a convolutional neural network or other computer vision model), or use a labeled data set to train your own feature extraction network.

Generation: Generate multiple random hidden vectors, input them into the trained GAN generator to generate a composite image, and then use a trained feature extractor to generate features for each image.

Connection: A general linear model (GLM) is used to perform regression between hidden vectors and features. The regression slope is the characteristic axis.

Explore: Start with a hidden vector and move to other feature axes to check how this affects the generation of the image.

This process is very efficient. Once we have a pre-trained GAN model, it only takes an hour to determine the feature axis on a GPU.

result

Let's see how effective this model is. First, I tested whether the feature axis can be used to control the corresponding generated image features. The results are as follows, this process is very good! The gender and age of the image can be "perfectly" changed:

But the above case shows a shortcoming. When I want to reduce the number of beards, it will inevitably make the face more "feminine". This problem is because gender characteristics and beard characteristics are related, and changing one will cause the other to change.

To solve this problem, I used the linear algebra method to project the axis representing the beard to other directions orthogonal to the gender axis, effectively eliminating the correlation between the two.

Finally, use the animation to understand the speed of TL-GAN in controlling the image generation process:

11 Inch Laptop

^{Do you still operate}^{11 Inch Laptop}^Deals^{? If yes, here is the right place you should put more time and energy. You can see here}^{11 Inch Laptop}^{in traditional standard or touch screen or 360 rotating.}^{11 Inch Windows Laptop}^{in metal with 360 yoga,}^{11 Inch Touch Screen Lapto}^{p on 2 in 1 style,}^{11 Inch Laptop With 8gb Ram}^{128gb in plastic, etc. Believe you can find right one here for you. Of course, if have other special requirement prefer, just call us and share your demand details, thus we can send right and value information for you quickly. Sometimes, you may hesitate which storage is most suitable for your jobs? 256GB or}^512GB^{SSD ROM provides}^huge^{storage space}^for^big^{files, so that you can}^{hold large documents}^{and work your way through}^{it freely. No worry lack of storage any more. N5100 CPU can e}^{nhance the overall performance}^for^office^,^{children students}^,^{daily entertainment, etc.}

Any other style prefer, just contact us and share your demand, then we can know how to do more for you.

11 Inch Laptop,11 Inch Windows Laptop,11 Inch Touch Screen Laptop,11 Inch Laptop Deals,11 Inch Laptop With 8gb Ram

Henan Shuyi Electronics Co., Ltd. , https://www.shuyicustomtablet.com