As we all know, when we need to let a large language model perform a task, we need to input a Prompt to guide its execution, which is described using natural language. For simple tasks, natural language can describe them clearly, such as: "Please translate the following into simplified Chinese:", "Please generate a summary of the following:", and so on.
However, when we encounter some complex tasks, such as requiring the model to generate a specific JSON format, or the task has multiple branches, each branch needs to execute multiple sub-tasks, and the sub-tasks are related to each other, then the natural language description is not enough.
topic of discussion
Here are two thought-provoking questions to try before reading on:
- There are multiple long sentences, each of which needs to be split into shorter sentences of no more than 80 characters, and then output into a JSON format that clearly describes the correspondence between the long and short sentences.
For example:
[ { "long": "This is a long sentence that needs to be split into shorter sentences.", "short": [ { "long": "This is a long sentence that needs to be split into shorter sentences. "short": [ "This is a long sentence", "that needs to be split", "short": [ "This is a long sentence that needs to be split", "short": [ "This is a long sentence", "that needs to be split", "into shorter sentences. "This is a long sentence", "that needs to be split", "into shorter sentences." This is a long sentence that needs to be split into shorter sentences.] }, { "short". }, { "long": "Another long sentence that should be split into shorter sentences.", "short": [ "that needs to be split", "into shorter sentences." ] }, { "short": [ "Another long sentence", "that should be split", "short": [ "that should be split", "short": [ "Another long sentence", "that should be split", "short". "into shorter sentences." ] } ]
- An original subtitled text with only dialog information, from which you now need to extract chapters, speakers, and then list the dialog by chapter and paragraph. If there are multiple speakers, each dialog needs to be preceded by a speaker, not if the same speaker speaks consecutively. (This is actually a GPT that I use myself to organize video transcripts Video Transcript Organization GPT)
Example Input:
So I'm going to quote Elon Musk, and I hope you don't mind. I apologize. But he doesn't agree that this is a model for privacy and security. He called the integration "creepy spyware." What's your response to that? Don't care? That's his opinion. Obviously, I don't think so. Neither do we. Mira, thank you for being with us. I know you're probably a little busy. I have a lot of questions for Mira, but we only have 20 minutes. So I'd like to set your expectations first. We're going to focus on a number of topics, including some recent news and some of Mira's areas of responsibility as CTO. Hopefully, we'll be able to dive into some of those topics. I guess my first question is, given that you're extremely busy right now, and the attacks in the news, some good, some bad, you joined the company about six years ago. At that time, it was a very different organization. You were relatively low-profile, not very well known. Do you miss the days when you could concentrate on your work? I'd say we're still fully engaged in our work. It's just that the work has evolved and it's not just about research. It's also because, you know, research has gotten a lot better. It's also about how we can bring this technology into the world in a way that's beneficial and safe, so the mission remains the same, and we've made a lot of progress in research, and the field of work is expanding. There's a lot of public interest in this, which may seem a little unusual for those of us who are working on technology and developing products. But you know, given the importance of what we're doing, this attention is very necessary and it's positive.
Sample Output:
### Introduction **Moderator**: So I'm going to quote Elon Musk, I hope you don't mind. I apologize, but he doesn't agree that this is a model for privacy and security. He called this integration "creepy spyware." What's your response to that? **Mira**: That's his opinion. Obviously, I don't think so. We don't think so either. ### Welcome and Introductions **Host**: Mira, thanks for being with us. I realize you may be slightly busy. I have a lot of questions for you, but we only have 20 minutes. So I'd like to set your expectations first. We're going to focus on a number of topics, including some recent news and some of the areas that you're responsible for as CTO. Hopefully, we can dive into some of those topics. ### Career Review **Host**: I guess my first question is, given the fact that you're extremely busy right now, and the attacks in the news, some good and some bad, you joined this company about six years ago. At that time, it was a very different organization. You were relatively low profile and not very well known. Do you miss the days when you could give your full attention to your work? **Mira**: I would say that we are still fully engaged in our work. It's just that the work has evolved and it's not just about research. It's also because the research has gotten a lot better. It's also about how we can bring this technology into the world in a beneficial and safe way. So the mission remains the same, and we've also made a lot of progress in research, and the areas of work are expanding. There's also a lot of public interest in this, which may feel a little unusual for those of us who are working on the technology and developing the product. But you have to realize that given the importance of what we're doing, this attention is very necessary, and it's positive.
The essence of Prompt
Maybe you've read a lot of articles online about how to write Prompt techniques and memorized a lot of Prompt templates, but what is the essence of Prompt? Why do we need Prompt?
Prompt is essentially a control instruction to the LLM, described in natural language, that allows the LLM to understand our requirements and then turn the inputs into our desired outputs as required.
For example, the commonly used few-shot technique is to let the LLM understand our requirements through examples, and then refer to the samples to output our desired results. For example, CoT (Chain of Thought) is to artificially decompose the task and limit the execution process, so that the LLM can follow the process and steps specified by us, without being too diffuse or skipping the key steps, and thus get better results.
It is like when we go to school, the teacher is talking about math theorems, to give us examples, through the examples let us understand the meaning of the theorem; in doing experiments, to tell us the steps of the experiment, even if we do not understand the principle of the experiment, but can follow the steps to execute the experiment, you can get more or less the same result.
Why is it that sometimes Prompt's results are less than optimal?
Because LLM cannot understand our requirements accurately, which is limited on the one hand by LLM's ability to understand and follow instructions, and on the other hand by the clarity and accuracy of our Prompt description.
How to precisely control the output of LLM and define its execution logic with the help of pseudo-code
Since Prompt is essentially a control instruction for the LLM, we can write Prompt without limiting ourselves to traditional natural language descriptions, but also with the help of pseudocode to accurately control the output of the LLM and define its execution logic.
What is pseudo-code?
In fact, pseudo-code has a long history. Pseudo-code is a formal description method for describing algorithms, which is a kind of description method between natural language and programming language for describing algorithm steps and processes. In various algorithm books and papers, we often see the description of pseudo-code, even you do not need to know into a language, but also through the pseudo-code to understand the execution process of the algorithm.
So how well does LLM understand pseudo-code? Actually, LLM's comprehension of pseudo-code is quite strong. LLM has been trained with a large amount of high-quality code and can easily understand the meaning of pseudo-code.
How to write pseudo-code Prompt?
Pseudocode is very familiar to programmers, and for non-programmers, you can write simple pseudocode just by memorizing some basic rules. A few examples:
- Variables, which are used to store data, e.g. to represent inputs or intermediate results with some specific symbols
- Type, used to define the type of data, such as strings, numbers, arrays, etc.
- function that defines the execution logic for a particular subtask
- Control flow, used to control the execution process of the program, such as loops, conditional judgments, etc.
- if-else statement, execute task A if condition A is satisfied, otherwise execute task B
- A for loop that performs a task for each element in the array.
- while loop, when the condition A is satisfied, the task B will be executed continuously.
Now let's write the pseudo-code Prompt, using the previous two reflection questions as an example.
Pseudo-code to output a specific JSON format
The desired JSON format can be clearly described with the help of a piece of pseudo-code similar to the TypeScript type definition:
Please split the sentences into short segments, no more than 1 line (less than 80 characters, ~10 English words) each. Please keep each segment meaningful, e.g. split from punctuations, "and", "that", "where", "what", "when", "who", "which" or "or" etc if possible, but keep those punctuations or words for splitting. Do not add or remove any words or punctuation marks. Input is an array of strings. Output should be a valid json array of objects, each object contains a sentence and its segments. Array
Organizing Subtitle Texts with Pseudo-Code
The task of organizing subtitled texts is relatively complex. If you imagine writing a program to accomplish this task, there may be many steps, such as extracting chapters first, then extracting speakers, and finally organizing the dialog content according to chapters and speakers. With the help of pseudo-code, we can decompose this task into several sub-tasks, for which it is not even necessary to write specific code, but only need to describe clearly the execution logic of the sub-tasks. Then execute these subtasks step by step, and finally integrate the result output.
We can use some variables to store in, such as subject
,speakers
,chapters
,paragraphs
etc.
When outputting, we can also use For loops to iterate through chapters and paragraphs, and If-else statements to determine if we need to output the speaker's name.
You task is to re-organize video transcripts for readability, and recognize speakers for multi-person dialogues. Here are the pseudo-code on how to do it Here are the pseudo-codes on how to do it :
def extract_subject(transcript). # Find the subject in the transcript and return it as a string. def extract_chapters(transcript): # Find the chapters in the transcript and return them as a list of strings. # Find the chapters in the transcript and return them as a list of strings. def extract_chapters(transcript). def extract_speakers(transcript): # Find the speakers in the transcript and return them as a list of strings. # Find the speakers in the transcript and return them as a list of strings. def extract_speakers(transcript). def find_paragraphs_and_speakers_in_chapter(chapter). # Find the paragraphs and speakers in a chapter and return them as a list of tuples. def find_paragraphs_and_speakers_in_chapter(chapter). # Each tuple contains the speaker and their paragraphs. # Find the paragraphs and speakers in a chapter and return them as a list of tuples. def format_transcript(transcript): # Extract the subject, the words, and the text of the tuple. # extract the subject, speakers, chapters and print them subject = extract_subject(transcript) print("Subject:", subject) speakers = extract_speakers(transcript) print("Speakers:", speakers) chapters = extract_chapters(transcript) print("Chapters:", chapters) # format the transcript formatted_transcript = f "# {subject}\n\n" for chapter in chapters. formatted_transcript += f "## {chapter}\n\n" for chapter in chapters. paragraphs_and_speakers = find_paragraphs_and_speakers_in_chapter(chapter) for speaker, paragraphs in paragraphs_and_speakers. # if there are multiple speakers, print the speaker's name before each paragraph if speakers.size() > 1. formatted_transcript += f"{speaker}:" formatted_transcript += f"{speaker}:" for paragraph in paragraphs: formatted_transcript += f"{speaker}: formatted_transcript += f" {paragraph}\n\n" formatted_transcript += "\n\n" return formatted_transcript print(format_transcript($user_input))
Let's see how it works out:
You can also just use the GPT I generated with this Prompt:Transcript organization GPT
Make ChatGPT Draw Multiple Images at Once with Pseudo Code
I also recently learned an interesting use of the term from a Taiwanese Internet user, Mr. Yin Xiangzhi, which isMake ChatGPT draw multiple images at once with pseudo-codeThe
Now if you want to make ChatGPT Drawing, generally will only generate a picture for you at a time, if you want to generate more than one picture at a time, you can use pseudo-code to break down the task of generating multiple pictures into multiple sub-tasks, and then execute multiple sub-tasks at once, and finally integrate the result output.
Below is a pseudo-code for drawing a diagram, please follow the logic of the pseudo-code and draw the diagram with DALL-E: images_prompts = [ { style: "Kawaii", prompt: "Draw a cute dog", aspectRatio: "Wide" }, { style: "Realistic", prompt: "Draw a realistic dog", aspectRatio: "Wide" }, { prompt: "Draw a realistic dog", aspectRatio: "Square aspectRatio: "Square" } ] images_prompts.forEach((image_prompt) => { print("Generating image with style: " + image_prompt.style + " and prompt: " + image_prompt.prompt + " and aspect ratio: " + image_prompt.aspectRatio) image_generation(image_prompt.style, image_prompt.prompt, image_prompt.aspectRatio); })
summarize
Through the above example, we can see that with pseudo-code, we can more accurately control the output of LLM and define its execution logic, instead of just limiting ourselves to natural language descriptions. When we encounter some complex tasks, or tasks with multiple branches, each branch needs to execute multiple sub-tasks, and the sub-tasks are related to each other, then using pseudo-code to describe the Prompt will be more clear and accurate.
When we write a Prompt, we remember that a Prompt is essentially a control instruction to the LLM, described in natural language, that allows the LLM to understand our requirements and then turn the inputs into our desired outputs as required. As for the form of describing the Prompt, there are many forms that can be used flexibly, such as few-shot, CoT, pseudo-code, and so on.
More examples:
Generate "pseudo-code" meta prompts for precise control of output formatting