AIGC (AI Generated Content) is content generated by AI, characterized by automated production and efficiency.
This is a recently popular term that has gained attention as natural language generation (NLG) and AI models have matured. Currently, AIGC can generate text, images, audio, video, and even 3D models and code.
Currently, content development can be divided into four stages:
- Professionally-Generated Content (PGC)
- User-Generated Content (UGC)
- AI-assisted Generated Content
- AI-Generated Content (AIGC)
Currently, we are still primarily in the first and second stages, with the third stage as a supplement.
Commercialization is very important, as successful commercialization provides motivation for research and development. While theoretical aspects also have significance, concrete implementation can greatly accelerate the development of technology. Currently, AIGC has three commercialization directions:
For example, automatic email writing and advertising copywriting, thanks to OpenAI's GPT-3 AI language model. Currently, most AI-generated text projects use this model (GPT-4 has been released, and GPT-5 is expected to be released this year, so pay attention to this iteration speed).
Recently, the popular ChatGPT has had successful commercialization, and its development is expected to accelerate even more rapidly. Once this advantage is established, it is difficult to catch up. It has already made an impact on search engines, causing even Google to panic.
There is no need to say much about text generation. The popularity of ChatGPT has led to a large number of related articles and videos, which have greatly popularized the field.
Speaking of which, below is the distribution of languages in the training data for GPT-3, with Simplified Chinese accounting for 0.02%. Considering the population of our country, it's quite... Chinese internet is dead (now it's the world of content farms!). Even so, ChatGPT surprisingly provides high-quality answers to Chinese questions, thanks to the translation ability implicitly learned by GPT.
The main technology combines the multimodal neural language model CLIP with the image denoising diffusion model Diffusion, allowing the generation of images with just a few keywords.
Currently, this direction is trending and has the potential to become the next ChatGPT. Do you remember the wave of 2D drawing caused by the NovelAI leak last year?
I will provide some development information about this below for your reference.
Currently, Stable Diffusion is the mainstream for ordinary users. The recent popular LoRA is Chilloutmix. If you are interested, you can learn more about it. There are many results on Bilibili, so you can see how far AI image generation has developed.
Commercialization in this field is undoubtedly in the mid-journey stage.
Development of underlying AI models for AIGC
OpenAI and StableAI are the leaders in this field and have the largest amount of funding.
OpenAI has Microsoft behind it, which reportedly provides a lot of computing resources. For this, OpenAI even dismantled its own department. Computing power is indeed a critical issue, and it ultimately comes down to high-end chips.
Google is feeling a bit uncomfortable. Although it is one of the largest owners of AI patents and has open-sourced many underlying solutions, it still cannot surpass others in practical implementation. Perhaps this is a common problem for large companies.
Development of AI-generated images
In 2014, Generative Adversarial Networks (GANs) were born, which truly "taught" AI to draw.
GAN consists of two models: a generator network G and a discriminator network D. G is responsible for generating images from random noise, while D is responsible for determining whether the image was generated by G or exists in the real world.
G and D compete with each other and their abilities improve continuously. When D can no longer distinguish images generated by G, training reaches a balance.
The innovation of GAN lies in its clever design of a "self-supervised learning" method, breaking free from the application dilemma of supervised learning that requires a large amount of labeled data. GAN can be widely used in image generation, style transfer, AI art, and color restoration of black and white photos.
However, its shortcomings also stem from this innovation: due to the need to train two models synchronously, GAN has poor stability and is prone to mode collapse. Another interesting phenomenon is the "Helvetica scenario": if the G model discovers a bug that can deceive the D model, it will start to take shortcuts and continuously use that image to deceive D, rendering the entire balance ineffective.
Models can also plateau, which is a sneaky characteristic that truly reflects human style.
In 2020, an academic paper on diffusion models significantly improved the level of AI drawing.
The principle of diffusion models is "add noise first, then reduce noise." First, Gaussian noise is gradually applied to an existing image until the image is completely destroyed, and then the original image is gradually restored based on the given Gaussian noise. After the model is trained, inputting random Gaussian noise can generate an image "out of nothing."
This design greatly reduces the difficulty of model training, surpasses the limitations of GAN models, and combines realism with diversity, enabling faster and more stable image generation.
The "takeoff" of diffusion models in the AI industry began in January 2021 when OpenAI developed the DALL-E model for generating images from text, which can generate images that are close to real life but do not actually exist, causing a sensation in the AI industry. However, due to the large amount of computation in the pixel space, this model still has the drawbacks of slow processing and high memory consumption.
Stable Diffusion, born in the summer of 2022, makes high-end academic theories more accessible.
In August of last year, Stability AI applied the diffusion process to a lower-dimensional latent space (Latent Diffusion) and developed the Stable Diffusion model. This model brings improvements in significantly reducing resource consumption, making it drivable by consumer-grade graphics cards (recommended VRAM 6G+). It is also more convenient to operate, allowing ordinary people to experience the stunning creative capabilities of artificial intelligence.
The development team has also open-sourced all the code, models, and weight parameter libraries (
some have been copied).
Note: Some resources are not suitable for browsing during work hours. NSFW warning.
The ability to write prompts is the core competitiveness, and it works well with ChatGPT.
Popular: Stable Diffusion + Chilloutmix + Koreandolllikeness
- Low-cost experience of generating AI girl photos
Recently, Bing also announced its drawing tool: https://www.bing.com/create. It seems to work well after trying it out.
Finally, is it troublesome to set up the environment? Is your local computing power insufficient?
You can try using Google's Colab for free. You can explore it yourself. However, some people have shared one-click running scripts. Keywords to search for:
- NovelAILeaks API Backend (4chan Ver.)
I'm not sure if current text-to-speech technology can be considered AI, but it is becoming more and more natural and is already widely used in commercial applications. For those interested in imitation, you can refer to Real-Time Voice Cloning and MockingBird, which claim to be able to simulate your voice with just a 5-second audio sample.
Because it is relatively mature, it has also been used in gray industries such as fraud. So, be cautious when it comes to trusting phone calls from your family members.
The development of the aforementioned technologies will undoubtedly have an impact on our work. In the future, related industries will require less human labor. Those currently working in text and drawing-related fields must keep up with the times. As the saying goes, AI is not meant to eliminate all practitioners but to optimize those who cannot use AI.
For example, ChatGPT can greatly improve your efficiency, but the premise is that you must know how to ask questions and be able to formulate or describe a good question. As mentioned earlier, the most difficult part of generating images based on descriptions is how to select the right prompts. It is said that positions related to these keywords have very high salaries.
Hopefully, we will not isolate ourselves from the new wave and continue to lag behind.
These AI tools, once tried, can significantly improve productivity. Some people even say that the singularity of AI has already arrived, and the future development of AI will be exponential.
OpenAI is really amazing, but not every path is smooth. For example, in AI drawing, although they were the first to propose the Diffusion Model and have their own product DALL-E, Stable Diffusion ultimately became the mainstream. Perhaps this is the competitiveness brought about by an open environment.
In this extremely poor environment of the Chinese internet, where everything is an information island, the so-called internet is not really interconnected. What we see is just mutual blocking and the crazy drainage of apps. The emergence of ChatGPT has brought a glimmer of hope. For those who are not proficient in English, they can finally get rid of annoying pop-ups, embedded ads, and the need to log in/follow/pay to view low-quality articles. It allows us to escape from these "features" and improve our efficiency.
China's AI technology reserves are actually very strong, but our skill tree should mainly focus on areas such as facial recognition and public opinion analysis.
Another interesting aspect is that AI-generated drawings are becoming more realistic, which may reshuffle the gray industries similar to welfare girls. After all, they are not competitive compared to AI.
Many people are also working on video generation. I recently came across some interesting things, such as real-time face replacement (DeepFaceLive).