present (sb for a job etc)
There was a time when creating a comic book was a tedious process that required writers, illustrators, and countless hours of effort. Today, artificial intelligence serves as a powerful tool that empowers creative professionals. Imagine handing over a short story to AI and watching how it helps transform that story into a vivid, visually stunning comic book - while retaining the creator's unique perspective. This is no longer just a fantasy; it's a reality made possible by cutting-edge generative AI models. In this blog, we'll exploreCrewAI How LLM Agents Enhance the Creative Process of Comic Book Creation delves into the structure and implementation that makes this magic possible.
Example: Creating a Storybook from Panchatantra
To demonstrate this process, let's use a short story from the Panchatantra, a collection of ancient Indian fables known for their wisdom and moral lessons. Consider the story of the Lion and the Hare:
Short story: "Once upon a time, there was a mighty lion named Basuraka who ran roughshod over the jungle. The animals got tired of his tyranny and decided to send him a prey every day. One day, it was the turn of the clever rabbit, who devised a plan to get rid of the lion. He lured Basuraka to a deep well and convinced him that another lion lived there. Seeing his reflection in the water, Basuraka roared in anger and jumped into the well, never to return."
Using the CrewAI framework, we will follow the steps below:
- 1. Scriptwriting Agents: Scriptwriters will break the story down into scenes, for example:
- - Scene 1: Lions roam the jungle.
- - Scene 2: The animals decide to deliver one prey per day.
- - Scene 3: The rabbit plans to trick the lion.
- - Scene 4: The lion jumps into the well.
- 1. Visual Artist Agency: Visual artists will generate illustrations for each scene depicting key moments such as the lion roaring through the jungle, the rabbit guiding the lion to the well, and the final scene of the lion jumping into the water.
- 2. Synthesizer Agent: In the end, the synthesizer combines all of these scenes and images into a coherent storybook ready to be viewed and shared.
For more detailed information about the story of Panchatantra, you can refer to external resources such as Panchatantra on Wikipedia or the Panchatantra story collection.
Automated Authoring Generation with LLM Agent
Generative AI agents can be thought of as a digital "team" working together to perform complex creative processes. By assigning specific tasks to individual AI agents, the process of creating an entire comic book becomes efficient and automated. In the following illustration, specialized agents work together:
- 1. script writer: Responsible for translating short stories into detailed breakdown scenes.
- 2. visual artist: Responsible for transforming each scene into a compelling piece of visual art.
- 3. synthesizers: Responsible for merging all generated scenes and their corresponding images into a coherent and complete comic book. The synthesizer ensures that the narrative flows smoothly and the final product is ready for release.
The synergy between these agents automates the comic book creation process, enabling an efficient and creative workflow. The key lies in the ability to utilize generative language models and image generation AI systems in a coordinated manner.
Architecture Overview
The architecture of this automation is simple but effective. The process starts with a short story and flows as follows:
- 1. Short story input: The process uses short narratives as inputs that serve as the basis for the comic book.
- 2. Scriptwriting Agents: The agent breaks the short story into discrete scenes, each capturing an important part of the storyline. In the illustration, this is shown as scenes labeled "Scene 1, Scene 2, Scene 3", etc., until the entire story is broken down into smaller scenes.
- 3. Visual Artist Agency: The visual artist is responsible for translating each scene description into a visual representation, effectively illustrating the comic. Visual elements are created as images to represent scenes like the lion in the sun, the lion meeting the rabbit, etc.
- 4. synthesizers: Finally, all scenes and their corresponding images are combined by a synthesizer agent to create a complete picture book.
The entire process is designed to seamlessly transform narratives into engaging comic books that require minimal human intervention.
Implemented using the CrewAI framework
To bring this vision to life, we implemented the CrewAI framework with three agents working in harmony. Below are the detailed steps of the implementation process, with placeholders for code snippets to help you reproduce the process step-by-step:
Defining Agents and Tasks: We use the CrewAI framework to define two agents - Agent 1 (script writer) and Agent 2 (visual artist). Both agents have specific roles with interrelated tasks for efficient workflow.
## 代理 scriptwriter: role: > 为儿童短故事编写场景剧本 goal: > 为儿童图画书编写简单、清晰且引人入胜的场景剧本。 backstory: > 你是一个专注于将儿童短故事转化为剧本的剧本编写者,用于表演或动画制作。 llm: llm_model ## 任务 scriptwriting: description: > 你将获得一个关于学习生活重要教训的儿童短故事。该故事需要转化为一本有趣的图画书,以便儿童阅读和参与。你负责将故事分解为 {number_of_scenes} 个独特场景,每个场景聚焦于故事中的特定事件或时刻。每个场景将转化为图像。你必须生成以下信息,遵循指定的pydantic模式: - 故事的合适名称 - 故事的简短摘要 - 故事的简短背景介绍,为读者提供重要的信息。 - 故事中每个场景的详细叙述,至少一到两句话。 - 故事中学到的最终教训。 <short_story> {story_text} </short_story> expected_output: > 输出必须严格遵循pydantic模式。如果不遵循,将会有惩罚。 agent: scriptwriter ## 代理 visualartist: role: > 故事书的视觉插图 goal: > 创建引人入胜的图画书。 backstory: > 创建图画故事书的专家。 llm: llm_model ## 任务 illustration: description: > 你将获得一个关于学习生活重要教训的儿童短故事。该故事将转化为一本有趣的图画书,以便儿童阅读和参与。故事已经分解为独特的场景。 下面是一个场景的描述,该场景的短摘要也在下面给出。 生成一个可以用于文本到图像模型的提示,以生成该场景的图像。将提示发送到提供的工具,以生成符合场景要求的角色和背景的图像。角色应为卡通风格。提示应少于40个字。 <story_summary> {story_summary} <story_summary> <scene_description> {scene_description} </scene_description> expected_output: > 输出必须严格遵循pydantic模式。如果不遵循,将会有惩罚。 agent: visualartist
Team Configuration: Define a structured schema for agents to generate responses and llm models, such as OpenAI and DaLLE models, and bind agents to their tasks.
dalle_tool = DallETool(model="dall-e-3", size="1024x1024", quality="standard", n=1) ## 为单个场景定义一个类 class StoryScene(BaseModel): scene_number: int narration: str ## 为故事场景列表定义一个类 class StoryScenes(BaseModel): story_name: str summary: str background: str lesson: str scenes: List[StoryScene] ## 为单个场景定义一个类 class SceneImage(BaseModel): prompt: str = Field(description = "可用于生成图像的文本到图像模型的提示。", max_length = 50) image_url: str = Field(description = "由工具生成的图像的URL") @CrewBase class StoryCrew(): """故事团队""" agents_config = 'config/story/agents.yaml' tasks_config = 'config/story/tasks.yaml' @llm def llm_model(self): return ChatOpenAI(temperature=0.0, # 设置为0以获得确定性输出 model="gpt-4o-mini", # 使用GPT-4 Turbo模型 max_tokens=8000) @agent def scriptwriter(self) -> Agent: return Agent( config=self.agents_config['scriptwriter'], output_pydantic = StoryScenes, verbose=True ) @task def scriptwriting(self) -> Task: return Task( config=self.tasks_config['scriptwriting'], output_pydantic = StoryScenes, ) @crew def crew(self) -> Crew: """创建故事团队""" script_crew = Crew( agents=self.agents, # 由@agent装饰器自动创建 tasks=self.tasks, # 由@task装饰器自动创建 process=Process.sequential, verbose=True, # process=Process.hierarchical, # 如果你想改用这个,可以参考 https://docs.crewai.com/how-to/Hierarchical/ ) return script_crew @CrewBase class ArtistCrew(): agents_config = 'config/visual/agents.yaml' tasks_config = 'config/visual/tasks.yaml' @llm def llm_model(self): return ChatOpenAI(temperature=0.0, # 设置为0以获得确定性输出 model="gpt-4o-2024-08-06", # 使用GPT-4 Turbo模型 max_tokens=8000) @agent def visualartist(self) -> Agent: return Agent( config=self.agents_config['visualartist'], tools=[dalle_tool], verbose=True ) @task def illustration(self) -> Task: return Task( config=self.tasks_config['illustration'], output_pydantic = SceneImage, output_file='report.md' ) @crew def crew(self) -> Crew: """创建图画书团队""" artist_crew = Crew( agents=self.agents, # 由@agent装饰器自动创建 tasks=self.tasks, # 由@task装饰器自动创建 process=Process.sequential, verbose=True, # process=Process.hierarchical, # 如果你想改用这个,可以参考 https://docs.crewai.com/how-to/Hierarchical/ ) return artist_crew
Main workflow: Ensure proper handoffs between the two agents. For example, once a script writer completes a scene, it is automatically passed on to the visual artist, ensuring continuity of workflow.
agentops.start_session( tags = ['story', 'scripts'] ) ## 使用 QuestCrew 创建假设或生成问题 inputs = { 'number_of_scenes': int(number_of_scenes), 'story_text': story_text, } scenes_list = StoryCrew().crew().kickoff(inputs=inputs) agentops.end_session("Success") if scenes_list is not None: print(f"Raw result from script writing: {scenes_list.raw}") slist = scenes_list.pydantic story_summary = slist.summary for scene in slist.scenes: print(f"Scene: {scene.narration}") scene_input = [{ "story_summary": story_summary, 'scene_description': scene.narration} for i, scene in enumerate(slist.scenes)] agentops.start_session(tags = ['scene', 'illustration']) ## 运行代理 result_images = ArtistCrew().crew().kickoff_for_each(inputs = scene_input) print("result_images : {result_images.raw}")
reach a verdict
The power of generative AI lies in its ability to augment and support the creative process, providing content creators with new tools to bring their ideas to life.The CrewAI LLM agent provides help in transforming simple short stories into engaging comic picture books, assisting storytellers at every stage of the journey. By automating repetitive tasks such as script decomposition and visual generation, AI enables artists and writers to focus more on core creative elements, preserving their unique artistic style. This implementation demonstrates how generative AI can enhance the creative industry, providing a vision of a future where creativity and technology work together seamlessly.