How to get it right o1: don't write prompts; write briefs, focus on goals: describe what you want tonothingNot what you want.what wayGet it and be aware of the pros and cons of o1!
Since the release of the o1 in October and the announcement of the o1 pro/o3 in December, many people have struggled to make sense of their perceptions, both positive and negative. We took a strong positive stance at the low point of the o1 Pro sentiment and mapped out what it might take for OpenAI to launch a $2,000 per month proxy product (rumored to be coming in the next few weeks). Since then, o1 has been solidly at #1 on all LMArena charts.
Since then, he's launched Dawn Analytics and continues to post unfiltered thoughts about o1 - initially as a loud skeptic and slowly becoming an everyday user. We love the various meanings of people who change their minds and think the same conversation is happening all over the world as people struggle to transition from chat mode to the new world of reasoning and hundreds of dollars a month for specialized AI products, now GA))). Here are our thoughts.
How did I go from hating o1 to using it every day to solve my most important problems?
I learned how to use it.
When the o1 pro was released, I didn't hesitate to subscribe.In order to justify the $200 per month price tag, it only needs to provide 1-2 engineers' hours per month
But at the end of the day of trying to get the model to work, I concluded thatIt's garbage.The
Every time I ask a question, I have to wait 5 minutes and am greeted with a lot of contradictory gibberish with unsolicited architecture diagrams + a list of pros and cons.
Of course, people often go very wild about OpenAI after release (which is the second best strategy for going viral, after negative reviews.)
But this feels different - these perceptions come from people in difficult situations.
As I started talking to people who disagreed with me, the more I realized I was completely wrong:
I use o1 like a chat model - but o1 is not a chat model.
How to use o1 correctly
If o1 is not a chat model, what is it?
I think of it like a "report generator". If you give it enough context and tell it what you want to output, it usually solves the problem once and for all.
swyx's note: OpenAI did publish a proposal for prompting o1, but we think it's incomplete, and in a sense you can think of this article as the "missing manual" for practical experience with o1 and o1 pro in practice.
1. Don't write prompts; write briefs
Provide lots of context. Whatever you think I mean by "a lot" - multiply it by 10.
When you use a program like Claude When modeling chat like 3.5 Sonnet or 4o, you usually start with a simple question and some context. If the model needs more context, it will usually ask you for it (or it will be obvious from the output).
You iterate back and forth with the model, correcting it + extending the requirements until you get the desired output. It's almost like pottery.The chat model essentially extracts context from you through this back and forth. Over time, our problem became faster + lazier - as lazy as possible while still getting good output.
o1 will only accept laziness literally and will not try to extract context from you. Instead, you need toPush as much context as possible to o1The
Even if you're just asking a simple engineering question:
- Explain all the ways you've tried that didn't work
- Add a full dump of all database schemas
- Explain what your company does and how big it is (and define company-specific terms)
In short, treat o1 as a new hire. Note that errors in *o1 include reasoning about how much it should reason. *Sometimes the variance fails to map accurately to task difficulty. For example, if the task is really, really easy, it usually goes down a rabbit hole of reasoning for no apparent reason.Note: The o1 API allows you to specify low/medium/high reasoning_effort, but the ChatGPT Not available to users.
Make it easier for o1 to get contextual hints
- I suggest using your mac/phone on the Voice Memos appI just describe the whole problem space for 1-2 minutes and then paste the text. I just describe the entire problem space for 1-2 minutes and then paste that text.
- I actually have a note where I keep long segments of context to be reused.
- swyx: I use Sarav's Careless in LS Discord. Whisper
- AI assistants that pop up inside the product can often make this extraction easier. For example, if you use Supabase, try asking Supabase Assistant to dump/describe all relevant tables/RPCs etc.
2. Focus on the goal: describe what you wantnothingNot what you want.what wayGet it.
Once you've populated the model with as much context as possible -Focus on explaining what you want the output to be.
For most models, we have gotten used to telling the model that we want it towhat wayAnswer us. For example, "You are a professional software engineer. Think slowly and carefully"
This is the opposite of what I find o1 successful. I don't direct it.what wayDo - only instruct itnothing. Then let o1 take over and plan and solve their own steps. This is the purpose of autonomous reasoning, and may actually be much faster than you can manually review and chat as an "artificial in the loop".
It requires that youReally know exactly what you want.(And you really should ask for a specific output in each prompt - it can only be reasoned about at the beginning!)
Sounds easier than it is! Do I want o1 to implement a specific architecture in production, create a minimal test application, or just explore options and list pros and cons? These are completely different requirements.
o1 usually explains concepts by default using report-style syntax - fully numbered headings and subheadings. If you want to skip the explanation and output the full document - you just need to state it explicitly.
- Professional tips from swyx: Establishing really good criteria for "good" and "bad" helps you toGive the model a way to evaluate its own output and self-improve/fix its own errorsThe
As an added benefit, this will eventually give you LLM as an evaluator tool that you can use for intensive fine-tuning during GA.
Since learning how to use o1, I've been blown away by its ability to generate the right answer the first time. It's actually better in almost every way (except cost/latency).
Here are some of the moments that particularly stand out:
3. Understanding the advantages and disadvantages of o1
o1 Advantages:
- Perfect for generating whole/multiple files at once: So far, this is the most impressive capability of o1. I copy/paste a lot of code, and a lot of context about what I'm building, and it generates the entire file (or multiple files!) in a single pass completely ), usually without errors, and following existing patterns in my codebase.
- Fewer hallucinations: In general, it seems to confuse things less. For example, o1 is really good at customizing query languages (like ClickHouse and New Relic), whereas Claude often confuses the syntax of Postgres.
- **MEDICAL DIAGNOSIS:** My girlfriend is a dermatologist - so whenever any of my friends or any member of my extended family has any skin problems, they send her a pic! For fun, I've started asking o1 at the same time. it's usually very close to the right answer - about 3/5 of the time. It's more useful for medical professionals -It almost always provides an extremely accurate differential diagnosis.
- **Explaining Concepts:** I found it to be very good at explaining very difficult engineering concepts with examples. It's almost like generating an entire article. When I'm dealing with difficult architectural decisions, I'll often have o1 generate multiple plans, each with pros/cons, and even compare those plans. I'll copy/paste the responses as PDFs and compare them - almost as if I'm considering proposals.
- **Reward: assessment. **I've always been skeptical of using LLM as a jury for evaluation, because fundamentally, jury models typically experience the same failure modes as the model that initially generated the output. However, o1 shows great promise - it is usually able to determine if the generation is correct in very little context.
Disadvantages of o1 (for now):
- **Writing in a specific voice/style:** No, I didn't use o1 for this post 🙂 .
I find it very bad at writing anything, especially in terms of a particular voice or style. It has a very academic/corporate reporting style that it wants to follow. I think there's just a lot of reasoning Token Lean the tone in that direction and it's hard to get rid of it.
Here's an example of me trying to get it to write this article - this is after much back and forth - it's just trying to produce a bland school report.
Build the entire application:o1 is very good at generating entire files at once. Still, despite some of the more ...... optimistic ...... demos you might see on Twitter - o1 won't build the entire SaaS for you, at least not after themagnanimousof iterations. But itpossible** Generate almost entire functions at once, especially front-end or simple back-end functionsThe