Brief Introduction to AIGC

AIGC (AI Generated Content) is content generated by AI, characterized by automated production and efficiency.

This is a recently popular term, with the maturity of natural language generation (NLG) and AI models, AIGC has gradually gained attention. Currently, it can automatically generate text, images, audio, videos, and even 3D models and code.

Development#

Currently, the development of content can be divided into four stages:

Professionally-Generated Content (PGC)
User-Generated Content (UGC)
AI-assisted Generated Content
AI-Generated Content (AIGC)

Currently, we are still in the first and second stages, with the third stage as a supplement.

Commercialization#

Commercialization is very important, and the smoother the commercialization, the more motivation there is to research and develop. Of course, the theoretical aspect also has certain significance, and being able to implement it specifically will greatly accelerate the development of its technology. Currently, AIGC has three commercialization directions:

Generating Text through AI#

For example, automatic email writing and advertising copywriting, thanks to OpenAI's GPT-3 AI language model. Currently, most AI text generation projects use this model (GPT-4 has been released, and GPT-5 is expected to be released this year, pay attention to this iteration speed).

Recently, the popular ChatGPT has made good progress in commercialization, and it can be considered a safe landing. It is expected to develop more rapidly in the future. Once this advantage is developed, it is difficult to catch up. It directly targets search engines, no wonder even Google is worried.

There is no need to say much about text generation. The popularity of ChatGPT has led to a large number of related articles and videos, which has greatly popularized the knowledge.

By the way, here is the proportion of training data languages for GPT-3 given by the official source, with Simplified Chinese accounting for 0.02%. Considering our populous country, it's quite... Chinese internet is dead (now it's the world of content farms!). Even so, ChatGPT's quality of answering Chinese questions is unexpectedly high, thanks to the translation ability implicitly learned by GPT.

AI-powered Drawing#

The main technology combines the multimodal neural language model CLIP with the image denoising diffusion model Diffusion, which can generate images with just a few keywords.

Currently, this direction is trending and may be the next ChatGPT. Do you remember NovelAI, which caused a craze in drawing anime characters last year?

I will post its development for your reference below.

Currently, Stable Diffusion is the mainstream for ordinary users. The recently popular LoRA is Chilloutmix. If you are interested, you can learn more about it. There are plenty of related videos on Bilibili. You can see how far AI drawing has developed now. The commercialization plan is undoubtedly in the mid-journey.

Development of underlying AI models for AIGC#

OpenAI and StableAI are the leaders in this field and have the largest amount of financing.

OpenAI is backed by Microsoft and is said to have provided a lot of computing resources, even at the cost of downsizing its own departments. Computing power is indeed a critical issue, which ultimately comes down to high-end chips.

Google is feeling a bit uncomfortable. Although it is one of the owners of the most AI patents and has open-sourced many underlying solutions, it still can't surpass others in practical applications. Perhaps this is a common problem for large companies.

Development of AI Drawing#

Early Breakthrough#

In 2014, Generative Adversarial Networks (GANs) were born, which truly "taught" AI to draw.

GAN consists of two models: a generator network G and a discriminator network D. G is responsible for generating images from random noise, and D is responsible for determining whether the image was generated by G or exists in the real world.

G and D compete with each other and their abilities continue to improve. When D can no longer distinguish the images generated by G, the training reaches a balance.

The innovation of GAN lies in its clever design of a "self-supervised learning" method, breaking free from the application dilemma of supervised learning that requires a large amount of labeled data. It can be widely used in image generation, style transfer, AI art, and color restoration of black and white photos.

However, its shortcomings also stem from this innovation: due to the need to train two models synchronously, GAN has poor stability and is prone to mode collapse. Another interesting phenomenon is the "Helvetica scenario": if the G model finds a bug that can deceive the D model, it will start to take shortcuts and continue to deceive D with that image, rendering the entire balance ineffective.

Models can also be lazy. This sneaky characteristic is truly unique.

Significant Improvement#

In 2020, an academic paper on Diffusion Models significantly improved AI's drawing ability.

The principle of the diffusion model is "adding noise first and then reducing noise." First, gradually apply Gaussian noise to the existing image until the image is completely destroyed, and then gradually restore the original image based on the given Gaussian noise. After the model is trained, inputting random Gaussian noise can generate an image "out of nothing."

This design greatly reduces the difficulty of model training, breaks through the limitations of GAN models, and combines realism with diversity, enabling faster and more stable image generation.

The "takeoff" of the diffusion model in the AI industry began in January 2021 when OpenAI developed the DALL-E model for generating images from text. It can generate images that are close to real life but do not actually exist, which shocked the AI industry. However, due to the large amount of computation in the pixel space, this model still has the drawbacks of slow processing and high memory consumption.

Mass Production#

Stable Diffusion, born in the summer of 2022, makes high-end academic theories more accessible.

In August of last year, Stability AI applied the diffusion process to a lower-dimensional latent space (Latent Diffusion) and developed the Stable Diffusion model. This model brings improvements in greatly reducing resource consumption. It can be driven by consumer-grade graphics cards (recommended VRAM 6G+), making it more convenient to operate. Ordinary people can also experience the amazing creative ability of artificial intelligence.

Moreover, the development team has open-sourced all the code, models, and weight parameter libraries (~~some have been copied~~).

Note: Some resources are not suitable for browsing during work hours. NSFW warning.
Having the ability to write prompts is the core competitiveness, and it works well with ChatGPT.

Popular: Stable Diffusion + Chilloutmix + Koreandolllikeness

Community:

Supporting:

Recently, Bing also released its drawing tool: https://www.bing.com/create. It seems to work well after trying it out.

Finally, is it troublesome to set up the environment? Is your local computing power insufficient?
You can try using Google Colab for free. You can explore it yourself. However, some people have shared one-click running scripts. The keywords are:

sd-1click-colab
NovelAILeaks API Backend (4chan Ver.)

About Voice#

I'm not sure if the current mature text-to-speech technology can be considered AI, but it is indeed becoming more natural and widely used in commercial applications. For those interested in imitation, you can refer to Real-Time Voice Cloning and MockingBird, which claim to be able to simulate your voice with just a 5-second audio sample.

Because it is relatively mature, it has been used in gray industries such as fraud. So, be cautious when receiving phone calls from your family members.

About Practitioners#

The development of the aforementioned technologies will undoubtedly have an impact on our work. In the future, related industries will not require as much human labor. Those currently working in text and drawing-related fields must keep up with the times. There is a saying that AI is not here to eliminate all practitioners but to optimize those who do not know how to use AI.

For example, ChatGPT can greatly improve your efficiency, but the premise is that you need to know how to ask questions and be able to present or describe a problem well. As mentioned earlier, the most difficult part of generating images based on descriptions is how to choose the prompts. It is said that positions related to these keywords have very high salaries.

Hopefully, we will not continue to isolate ourselves from the new wave and fall behind.

These AI tools, once tried, can significantly improve productivity. Some people say that the singularity of AI has already arrived, and the future development of AI will be exponential.

Random Thoughts#

OpenAI is really amazing, but not every path is smooth. For example, in AI drawing, although they were the first to propose the Diffusion Model and have their own product DALL-E, Stable Diffusion eventually became the mainstream. Perhaps this is the competitiveness brought about by an open environment.

In this extremely poor environment of the Chinese internet, where everything is an information island, the so-called internet is not really interconnected. What we see is just mutual blocking and the crazy drainage of apps. The emergence of ChatGPT brings a glimmer of hope. For those who are not good at English, they can finally get rid of annoying pop-ups, embedded ads, and the need to log in/follow/pay to view low-quality articles. It frees us from these "features" and improves our efficiency.

Our country's AI technology reserves are actually very strong, but our skill tree should mainly focus on areas such as facial recognition and public opinion analysis.

Another interesting aspect is that AI-generated drawings are becoming more realistic, which may reshuffle the gray industry similar to welfare girls. After all, they are not competitive compared to AI.

Many people are also working on video generation. I recently came across some interesting things, such as real-time face replacement (DeepFaceLive), and so on.

References#

https://36kr.com/p/2111870770153858