General Introduction
NVIDIA Garak is an open source tool that specializes in detecting vulnerabilities in large language models (LLMs). It checks the model for multiple weaknesses such as illusions, data leakage, hint injection, error message generation, harmful content generation, etc. through static, dynamic, and adaptive probing.Garak is similar to nmap in cybersecurity but focuses on the security assessment of LLMs.
Related tools: https://github.com/msoedov/agentic_security
Function List
- vulnerability scan: Detects a variety of potential vulnerabilities in LLM, including phantoms, data leaks, hint injections, etc.
- Generative AI Evaluation: Evaluating the performance of generative AI models in different contexts.
- Dialogue system testing: Test the response of a dialog system under different inputs to identify potential security issues.
- Multi-model support: Supports Hugging Face, OpenAI, Replicate, and many other generative models.
- command-line tool: Operates from the command line and is available for Linux and OSX systems.
- Logging: Detailed records of the scanning process and results for subsequent analysis and improvement.
Using Help
Installation process
- Standard Installation::
- Install from PyPI using pip:
python -m pip install -U garak
- Install from PyPI using pip:
- Development Version Installation::
- Get the latest version from GitHub:
python -m pip install -U git+https://github.com/NVIDIA/garak.git@main
- Get the latest version from GitHub:
- Cloning from source::
- Create a Conda environment and install dependencies:
conda create --name garak "python>=3.10,<3.12" conda activate garak gh repo clone NVIDIA/garak cd garak python -m pip install -e .
- Create a Conda environment and install dependencies:
Usage
- basic usage::
- Garak needs to know which model to scan and defaults to all known probes for that model. Use the following command to view the list of probes:
garak --list_probes
- Specifies the generator type and model name:
garak --model_type huggingface --model_name RWKV/rwkv-4-169m-pile
- Run a specific probe:
garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
- Garak needs to know which model to scan and defaults to all known probes for that model. Use the following command to view the list of probes:
- Read the results::
- Upon completion of each probe, Garak generates a progress bar and outputs the evaluation results upon completion. If any of the prompting attempts result in bad behavior, the response is marked FAIL and a failure rate is given.
- Logs and reports::
- Errors are logged in the garak.log file and runtime details are logged in the .jsonl file. This can be analyzed using the analyse/analyse_log.py script.
- typical example::
- sensing ChatGPT Vulnerability to code injection attacks:
export OPENAI_API_KEY="sk-123XXXXXXXXXXXXXX" python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
- sensing ChatGPT Vulnerability to code injection attacks:
- Plug-in Development::
- Inherit the base class and override the necessary methods to test the new code:
import garak.probes.mymodule p = garak.probes.mymodule.MyProbe() python3 -m garak -m test.Blank -p mymodule -d always.Pass
- Inherit the base class and override the necessary methods to test the new code: