Just last night, news of Anthropic's upcoming release of a new model spread quickly through the AI community, but not in the way that was previously expected. Claude 4.0, but rather Claude 3.7 Sonnet version.
Early this morning, Anthropic released its latest flagship model right on time, theThe official launch of Claude 3.7 Sonnet, claimed to be the smartest to date and the first hybrid inference model on the market!The
Claude 3.7 Sonnet delivers both fast responses in near real-time and deeper, more detailed step-by-step thinking based on user needs. As Anthropic The description "One model, two ways to think..." refers to the fact that it has both standard and extended modes of thinking. In addition, API users have more fine-grained control over the length of time a model can think.
In addition to the release of Claude 3.7 Sonnet.Anthropic has also launched a parallel command line tool called Claude Code that focuses on smart codingClaude is available as a limited research preview. The tool is currently available as a limited research preview and is designed to allow developers to leave a large number of engineering tasks to Claude directly in the terminal environment.
In terms of coding capabilities, Anthropic has further optimized the coding experience on the Claude.ai platform. Its GitHub integration is now available across all Claude programs, allowing developers to connect their code repositories directly to Claude, and by providing a deeper understanding of personal, work, and open source projects, Claude will become an even more powerful assistant for developers when it comes to bug fixing, feature development, and documentation building in GitHub projects.
Because of this, and benefiting from significant improvements in coding and front-end web development capabilities.Claude 3.7 Sonnet became Anthropic's best encoding model to date.The
Currently, users can experience the latest Claude 3.7 Sonnet model through all Claude plans (including Free, Pro, Team, and Enterprise), as well as platforms such as Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. In addition to Free users, all paid users can experience its Extended Thinking model.
In the standard and extended thinking modes, thePricing for Claude 3.7 Sonnet remains consistent with the previous generation of Claude 3.5 Sonnet at $3 per million input tokens and $15 per million output tokens (including think tokens).The
As one user commented, "Every new release from Anthropic is surprising and exciting!"
Maximum Claude 3.7 Sonnet
Putting cutting-edge reasoning at your fingertips
Anthropic emphasizes that Claude 3.7 Sonnet was developed with a different philosophy than other reasoning models on the market, arguing that just as the human brain is able to react quickly and think deeply at the same time, AI reasoning should integrate the capabilities of cutting-edge models, rather than separating them from each other. This unified design approach aims to provide a smoother user experience.
In line with this philosophy, the Claude 3.7 Sonnet offers a number of unique advantages.
First.Claude 3.7 Sonnet is unique in that it can be used as a general-purpose LLM but also has powerful reasoning capabilities. Depending on your needs, you can choose to have the model give you a quick answer, or to think more deeply before answering.The Claude 3.7 Sonnet can be seen as an upgrade from the previous Claude 3.5 Sonnet. In standard mode, Claude 3.7 Sonnet can be seen as an upgraded version of its predecessor, Claude 3.5 Sonnet. In Extended Thinking Mode, it reflects on itself before giving an answer, which significantly improves its performance on a wide range of tasks, including math, physics, instruction following, coding, etc. Anthropic officials note that in both modes, the model understands and processes the cue words in a similar way.
Secondly.When calling Claude 3.7 Sonnet using the API, users can also customize the model's "thinking budget". Specifically, the user can set Claude to think in terms of the maximum number of token Number (N). Regardless of the N value, the model caps the number of output tokens at 128K. This allows the user to find the optimal balance between speed (and cost) of response and quality of answer.
Third, in developing its inference model, theInstead of focusing excessively on optimizing model performance on math and computer science competition questions, as others have done, Anthropic focuses on real-world tasks that are more relevant to practical application scenarios in the enterpriseThe
From the Claude 3.7 Sonnet benchmark results, in the SWE-bench Verified benchmark (which was designed to evaluate LLM's ability to solve real software problems on GitHub), theClaude 3.7 Sonnet achieved SOTA-level performance, significantly ahead of models such as Claude 3.5 Sonnet, OpenAI's o3-mini (high) and o1, and DeepSeek R1.The
The Claude 3.7 Sonnet also performed well in the TAU-bench benchmark, a benchmarking platform used to evaluate LLM's ability to interact with the tool in complex, realistic scenarios, achieving SOTA-level performance, outperforming both the Claude 3.5 Sonnet and OpenAI's o1 model.
Claude 3.7 Sonnet demonstrates excellent performance in a number of areas, including instruction adherence, generalized reasoning, multimodal capabilities, and intelligent coding, with significant enhancements in math and science, especially in Extended Thinking Mode. However, in some specific areas, it still falls slightly short of OpenAI's o3-mini (high), Grok-3 Beta, and other models.
It's easy to see that Anthropic has focused on coding capabilities with Claude 3.7 Sonnet, with relatively less prominent improvements in other areas. It is clear that Anthropic intends to position the Sonnet series as an AI model focused on coding (and is actually moving in that direction).
It's worth noting that in addition to excelling in traditional benchmarks, the Claude 3.7 Sonnet even outperformed all previous models in the Pokémon playtest.
Anthropic has already conducted extensive early testing with its partners, and the results have amply demonstrated the leadership of the Claude family of models in terms of encoding capability.
For example, the Cursor team noted that Claude was once again the preferred solution for real-world coding tasks, showing significant improvements in handling complex code bases and using advanced tools, and the Cognition team found that Claude outperformed the other models in code change planning and full-stack update processing. Vercel emphasized Claude's accuracy in complex agent workflows, and Replit successfully used Claude to build complex web applications and dashboards from scratch where other models struggled, while Canva's evaluation showed that Claude consistently produced well-designed, production-ready code with significantly fewer bugs. Significantly reduced error rates.
Claude Code
Intelligent Coding for Easier Development
Since June 2024, the Sonnet family of models has been the go-to choice for developers around the world. Today, theAnthropic has officially released Claude Code, its first intelligent coding tool (currently in a limited research preview), designed to further enhance developer productivity and capabilityThe
Functionally, Claude Code is positioned as a proactive collaboration partner, capable of performing tasks such as code searching and reading, file editing, test writing and running, code committing and pushing to GitHub, and invoking various command line tools.
Let's go through a few examples of Claude Code's application scenarios, such as explaining the structure of a project:
Writing tests:
Build the application:
Although still in early preview, Claude Code has become an indispensable tool for the Anthropic team, especially for test-driven development, debugging complex problems, and large-scale code refactoring.
In early testing, Claude Code has been able to accomplish tasks in a single pass that would normally take more than 45 minutes to complete manually, significantly reducing development time and costs.The
In the coming weeks, Anthropic plans to continue optimizing Claude Code based on feedback from its own usage, including improving the reliability of tool calls, enhancing support for long-running commands, improving in-app rendering, and expanding the depth of Claude's understanding of its own functionality.
The launch of Claude Code is designed to provide a deeper understanding of how developers work with Claude for coding, thus providing a valuable reference for future iterations and upgrades of Anthropic's models. Those who participate in the Claude Code preview experience will have early access to the powerful tools Anthropic uses internally to build and optimize Claude models.
Responsible construction and future perspectives
Anthropic thoroughly tested and evaluated Claude 3.7 Sonnet and worked with external security experts to ensure that the model fully meets the security and reliability standards it sets for itself.
At the same time, Claude 3.7 Sonnet demonstrates finer judgment in distinguishing between harmful and benign requests. Compared to the previous generation model, it has reduced the number of unnecessary rejections by 45%.
CoT fidelity assessment results.
In the Model Card for Claude 3.7 Sonnet, Anthropic details its framework for evaluating responsible AI scaling policies and draws on the hands-on experience of other AI labs and researchers in related work. Additionally, the model card outlines the new types of risks posed by the application of AI technologies, specifically rapid injection attacks, and explains how Anthropic assesses and responds to these potential security vulnerabilities, as well as how it trains the Claude model to defend against and mitigate these risks. In addition to this, the Model Card delves into the potential security benefits that inference models can bring, and examines questions such as "how to understand the model's decision-making process" and "whether the model's inference results are truly trustworthy and reliable".
Anthropic believes that the release of Claude 3.7 Sonnet and Claude Code marks a critical step towards truly empowering humans with AI systems. With superior deep reasoning, autonomous work, and efficient collaboration, Anthropic is bringing us closer to a vision of a future in which AI technology fully enriches and expands human potential.
Anthropic also has an exciting vision for the future: by 2025, they expect Claude to have evolved into an expert intelligence that can work autonomously for hours on end, and by 2027, Anthropic expects Claude to be able to tackle complex problems that would take years for a human team to solve.