Petri - Anthropic's open source AI security auditing framework

堆友AI

What's Petri?

Petri, yes. Anthropic An open source AI security auditing framework developed to systematically assess the security and behavioral alignment of AI models. By simulating real-world scenarios, an automated auditor is allowed to conduct multiple rounds of conversations with a target model, and then a judge agent scores the model's behavior in multiple dimensions.Petri supports a variety of model APIs and provides a rich set of seed commands covering high-risk scenarios such as deception, flattery, and cooperating with harmful requests. Tests were conducted on 14 cutting-edge models, and all models were found to have varying degrees of security alignment risk in different scenarios.

Petri - Anthropic开源的 AI 安全审计框架

Features of Petri

  • Automated audits: Automatically evaluate model behavior by simulating multiple rounds of conversations between users and tools and the target AI system.
  • Multi-dimensional scoring: A multidimensional analysis of the model's behavior, focusing on security-related dimensions.
  • Seed command support: Provides a diverse set of seeding instructions covering a wide range of high-risk scenarios to help researchers start testing quickly.
  • Model Compatibility: Supports a variety of mainstream model APIs to facilitate testing on different models.
  • Visualization of results: Provides clear test results and scores to help researchers quickly identify potential risks to their models.
  • open source and scalable: The code is open-source, making it easy for researchers to customize and extend it according to their needs.

Petri's core strengths

  • Automation and efficiency: By automating the audit process, Petri is able to quickly generate large volumes of test results, significantly improving evaluation efficiency and saving time and labor costs.
  • Comprehensive and multidimensional assessment: Supports multi-dimensional security assessment of AI models, covering a wide range of high-risk behaviors such as spoofing, grooming, and self-protection, providing comprehensive security analysis.
  • Flexibility and ScalabilityThe APIs support a variety of models, allowing researchers to easily expand and customize test scenarios to suit different research needs.
  • Open Source and Community Support: As an open source tool, Petri is supported by an active community where researchers can share test results, improve code, and promote technical exchange and advancement.
  • Systematic and standardizedThe company provides a systematic testing framework and standardized evaluation process to help researchers establish reproducible and comparable testing benchmarks and promote the standardized development of AI security research.

What is Petri's official website?

  • Official website address:: https://www.anthropic.com/research/petri-open-source-auditing
  • Github repository:: https://github.com/safety-research/petri

Who is Petri for?

  • AI researchers: Examining the security, reliability, and behavioral alignment of AI models, systematically tested and analyzed via Petri.
  • Model Developer: Engineers developing large-scale language models or other AI systems use Petri to evaluate and optimize the safety and performance of models.
  • security expert: Professionals concerned about the potential risks of AI technology use Petri to identify and prevent security threats that may be posed by models.
  • Technical Assessment Team: Enterprise or institutional teams responsible for evaluating and auditing AI systems, leveraging Petri for standardized security assessments.
  • academic researcher: Scholars engaged in academic research in the field of AI security conduct experiments and research to advance theory and practice through Petri.
© Copyright notes

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...