ChatGPT image generation sparks the web: technological breakthroughs, copyright fiasco and arithmetic emergency

AI News1yrs agorelease AI Sharing Circle

55.3K 00

OpenAI recently integrated its advanced image generation technology directly into the ChatGPT, a move that quickly ignited user enthusiasm and a series of knock-on effects. The feature utilizes the powerful GPT-4o Modeling Capabilities, Technological Bloodlines and Video Generation Models Sora similar, allowing users to create high-quality still images directly within the familiar dialog interface, greatly enhancing ease of use.

This image generation capability is available to all ChatGPT Open to users, including paid subscribers (Plus, Pro, Team) and free users.OpenAI revealed that the initial daily generation amount for free users was about three times, with the DALL·E The previous strategy was similar, but will be dynamically adjusted based on demand. This move has certainly accelerated the popularity of high-quality AI image generation, putting it on a par with the Midjourney and other paid services and Stable Diffusion and other open-source models compete for a wider user portal.

The technology engine: core competencies driving the boom

This integration is not simply a stack of functions, behind it is a significant advancement in image generation technology. A major highlight is the solution to the "attribute and object association" (binding) problem that has long plagued AI image generation. In the past, it was difficult for the model to accurately deal with instructions such as "blue stars and red triangles", often confusing colors and shapes. According to OpenAI Head of Research Gabriel Goh The new model was described as being able to stably handle instructions containing 15 to 20 objects and their complex relationships, far exceeding the limits of the old model.

Another key improvement is the quality of text rendering within images. AI has always had difficulty generating clear, error-free text in images, which has hampered many potential applications (e.g., poster, logo design).Goh After months of optimization, the new model has become quite reliable in text rendering, greatly broadening the application scenarios. This is due to the "autoregressive generation method" used in the model, whereby pixel-by-pixel, sequential drawing (e.g., left-to-right, top-to-bottom) provides better control of details than diffusion models, which generate the entire image at once, and is especially conducive to the accurate rendering of text.

These advances rely on GPT-4o The omnimodal core of the model was designed to unify text, images, audio and video. At the same time, the model incorporates a wide range of "world knowledge" to enable it to understand the logic and common sense behind images. For example ChatGPT Multimodal Product Owner Jackie Shannon Said user does not need to over-explain that the model can also generate images that are consistent with the laws of physics and background knowledge, such as a schematic of Newton's trigonometry experiment or a comic strip that maintains character consistency.

ChatGPT 图像生成引爆网络：技术突破、版权风波与算力告急

Double Consequences: The Resources and Ethical Challenges Behind Success

It's these powerful capabilities that made the new feature quickly set the internet ablaze upon its launch, but also made the OpenAI Immediately, there are two major challenges: the enormous pressure on arithmetic resources and the lingering controversy over copyright ethics.

First is the resource level. The huge user demand makes OpenAI s servers are overwhelmed. the CEO Sam Altman exist X The platform described the dilemma with the phrase "our GPUs are melting" ("我们的 GPUs are melting"). In order to maintain service stability, theOpenAI Rate limiting had to be implemented as a matter of urgency. Having already delayed the full opening to free users due to high demand, the further confirmation of the limited daily quota for free users (around three) underscores the fact that arithmetic costs and resource bottlenecks for large-scale deployments of cutting-edge AI applications remain a stark reality, even for industry giants.

Secondly, there is the ethical and copyright dimension. The new powerful mimicry was quickly tapped into by users, with images created in the style of Japanese animator Hayao Miyazaki going viral on social media, sparking a frenzy of activity.

However, this "lovely storm" quickly touched the sensitive red line of copyright. Just one day later.OpenAI The company began restricting users from generating images in the styles of specific living artists (particularly "Miyazaki style"), and has publicly stated that it is taking a more "conservative" approach. The spokesperson said that it currently prohibits the generation of "individual living artist styles" but allows "broader studio styles" or the styles of deceased artists, and that it will continue to adjust its policy based on feedback.

This incident has once again pushed the contradiction between generative AI's ability to imitate art and the protection of creators' rights and interests to the forefront. It is worth mentioning that Hayao Miyazaki himself has always been critical of AI art, once calling it "an insult to life itself".Studio Ghibli While not directly responding to the incident, the OpenAI The rapid response shows that how to draw the line between technological innovation and respecting the existing art ecosystem is still a challenge that the whole industry needs to seriously face.

ChatGPT 图像生成引爆网络：技术突破、版权风波与算力告急

Operational Considerations and Future Prospects

While addressing the challengesOpenAI Operational details of the new feature are also explained. Regarding the speed of generation, theShannon Acknowledging that it may be slightly slower at the moment, it was emphasized that this is a necessary trade-off in the pursuit of higher image quality, including the knowledge it implies.

In terms of image traceability and ownership, the generated image will not have a visible watermark added, but will be embedded in a file that conforms to the C2PA Standard metadata to identify the source, while the user has full rights to use the generated image (subject to platform policies).

OpenAI Integrate powerful image generation capabilities into ChatGPTIt's a major step towards mainstream AI adoption. However, the ensuing arithmetic tension and copyright disputes also clearly reveal that the road ahead is not a straight one. How to effectively manage resource consumption, clarify ethical boundaries, and balance the interests of all parties while the technology develops at a rapid pace will be a major challenge. OpenAI and the AI industry as a whole will continue to be a central topic in the future.