Today, Anthropic announced the availability of an upgraded version of theClaude 3.5 Sonnetand new modelsClaude 3.5 Haiku. This update not only improves coding capabilities, but also introduces a groundbreaking feature - theComputer use, which is currently in the public testing phase.
updated Claude 3.5 Sonnet improves on its predecessor on all fronts, making significant progress especially in the area of encoding, a field in which it was already a leader.The Claude 3.5 Haiku performs on a par with our previous largest model, the Claude 3 Opus, achieving the same cost and similar speeds to the previous generation Haiku in many evaluations.
Major Upgrade for Claude 3.5 Sonnet
The upgraded Claude 3.5 Sonnet excels in a number of areas, particularly in encoding, where its performance improves from 33.41 TP3T to 49.01 TP3T in the SWE-bench Verified benchmark, outperforming all publicly available models. Additionally, Sonnet's performance in the retail and aerospace domains of TAU-bench also improved significantly, from 62.61 TP3T to 69.21 TP3T and from 36.01 TP3T to 46.01 TP3T, respectively.
Early user feedback shows that Claude 3.5 Sonnet performs well during multi-step software development, with companies such as GitLab finding that it improves reasoning by 101 TP3T with no increase in latency.
Claude 3.5 Haiku: efficient and economical at the same time
The new Claude 3.5 Haiku is the fastest model available today, and it performs particularly well on coding tasks, scoring 40.61 TP3T in SWE-bench Verified.Haiku outperforms its predecessor, the largest model, the Claude 3 Opus, at the same cost and speed.
Innovative computer usage features
Claude 3.5 Sonnet is the first cutting-edge AI model to offer computer usage features in public testing. Developers can instruct Claude to use the computer like a human via an API, including viewing the screen, moving the cursor, clicking buttons, and entering text. This functionality, while still in the experimental stage, is already being used by Asana, Canva, Cognition and others to perform complex tasks.
While the current Claude is still clumsy when performing certain actions, its score of 14.91 TP3T on the OSWorld evaluation is much higher than the 7.81 TP3T of other AI systems.Anthropic says it will continue to improve this capability and take steps to ensure safe use to prevent potential abuse.
looking forward
As the technology continues to evolve, Anthropic looks forward to learning more about the potential and impact of this new functionality through user feedback. The company encourages developers to explore these new models and looks forward to seeing how they utilize these innovations to drive productivity.
Anthropic believes that these new developments will open up new possibilities for users to interact with Claude.