Okareo: a tool for model testing and error monitoring for AI developers

Latest AI Resources4mos agorelease AI Sharing Circle

General Introduction

Okareo is a platform built for AI developers that focuses on helping users test AI models, find bugs and improve performance. It provides complete tools from data generation to real-time monitoring for large language models (LLMs), intelligences and retrieval augmented generation (RAG) systems. Developers can use it to generate diverse test scenarios, check model performance in production environments, quickly identify problems and optimize. okareo emphasizes real-time, alerts when models are in error, and supports team collaboration and large-scale projects. More than 5 million test scenarios have been generated with Okareo, making it ideal for development teams that need a reliable AI system.

Function List

false discovery: Detect problems in the model output, such as hallucinations or inaccurate answers.
Synthetic data generation: Automatically generate diverse test data covering common and extreme scenarios.
real time monitoring: Track model behavior in the production environment and raise alerts when anomalies are detected.
Model Evaluation: Tests LLMs, intelligences, or RAG performance, generating detailed reports.
boundary test: Explore the limits of the model through complex scenarios to identify potential points of failure.
Optimization Tools: Adjusting models and retrievers to improve domain-specific performance.
Teamwork: Supports multi-person collaboration to streamline the development process.
CI/CD Integration: Embed testing in the automation development pipeline.

Using Help

The use of Okareo is divided into two ways: web operations and code integration. Below are detailed steps to help you get fully up to speed from registration to optimizing your model.

Register & Login

interviews https://okareo.com/Click on the "Get Started for Free" button. Enter your email address and password to register, then click the link to activate your account once you receive the verification email. Sign In https://app.okareo.com/, go to the console. This is where you manage your project and view results.

Getting the API key

After logging in, click "Settings > API Token" in the upper right corner to generate a key, for example YOUR_OKAREO_API_KEYThis key is used for code calls or CLI operations. This key is used for code calls or CLI operations and is recommended to be kept in a secure location.

Installing the CLI Tool

If you want to operate Okareo from the command line, you can install the CLI. depending on your system:

MacOS: Run curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_darwin_arm64.tar.gzUnzip tar -xvf okareo_darwin_arm64.tar.gzThe
Windows (computer): Run it with PowerShell Invoke-WebRequest -Uri https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_windows_386.tar.gz -OutFile okareo_windows_386.tar.gzUnzip tar -xvf okareo_windows_386.tar.gzThe
Linux: Run curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_linux_386.tar.gzUnzip tar -xvf okareo_linux_386.tar.gzThe

After unzipping the okareo Move to a system path (e.g. /usr/local/bin), run okareo -v Check the version.

Initialization Project

Go to the project directory in the terminal and run it:

okareo init

generating .okareo Folder, Edit config.yml, fill in:

api_key: YOUR_OKAREO_API_KEY

Initialization is complete and the project is ready.

Generating synthetic data

Login to the web site and select "Synthetic Scenario Copilot". Input requirements, such as "users complain about product failure", click "Generate" to generate test data and download it as JSONL file:

{"input": "产品坏了怎么办？", "expected_output": "请联系客服申请维修。"}

CLI mode:

okareo generate --scenario "产品故障投诉" --output test_data.jsonl

The data can be used for subsequent testing.

Register and evaluate models

pip install okareo

compile eval_model.py::

from okareo import Okareo
from okareo.model_under_test import OpenAIModel
okareo = Okareo("YOUR_OKAREO_API_KEY")
model = okareo.register_model(
name="MyAgent",
model=OpenAIModel(model_id="gpt-3.5-turbo", temperature=0)
)
result = model.run_test(scenario_file="test_data.jsonl", test_type="classification")
print(result["link"])

Once run, the results link to a web report showing accuracy and other metrics.

Real-time monitoring and alerts

Agents are required for production environment monitoring. Modify OpenAI calls:

from openai import OpenAI
client = OpenAI(
base_url="https://proxy.okareo.com",
default_headers={"api-key": "YOUR_OKAREO_API_KEY"},
api_key="YOUR_OPENAI_KEY"
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "产品怎么样？"}]
)

The data is recorded on Okareo's web-based "Monitoring" page, which shows real-time performance and alerts the system if there are hallucinations or errors.

Test Boundary Scenarios

Input complex scenarios such as "user asks questions 5 times in a row and changes the requirements" on the web side, and generate multiple rounds of dialog data:

okareo generate --scenario "多轮需求变化" --output edge_cases.jsonl

The model was tested with these data to check its stability.

Optimization Models

The evaluation report will show problems, such as retrieving irrelevant content. After adjusting the prompt words or fine-tuning the model, re-run the test. The web side provides a comparison function to view the optimization results.

CI/CD Integration

Add to GitHub Actions .github/workflows/okareo.yml::

name: Okareo CI
on: [push]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: curl -O -L https://github.com/okareo-ai/okareo-cli/releases/latest/download/okareo_linux_386.tar.gz
- run: tar -xvf okareo_linux_386.tar.gz
- run: ./okareo run --file flows/test_flow.py
env:
OKAREO_API_KEY: ${{ secrets.OKAREO_API_KEY }}

Each push is automatically tested.

Viewing Results and Debugging

log in https://app.okareo.com/The report can be viewed under "Evaluations". The report includes score and error details for each scenario for easy debugging.

These steps cover the entire process from installation to optimization, and the detailed instructions make it easy to use Okareo.

application scenario

Developing Intelligent Customer Service
You're building a customer service AI and want to make sure it handles complaints correctly. Use Okareo to generate complaint scenarios, test and optimize responses.
Building RAG Applications
Your RAG system needs to ensure the quality of retrieval and generation, and Okareo can test retrieval accuracy and improve generated content.
Debugging Complex Intelligence
You develop a multitasking intelligence, and Okareo can simulate boundary scenarios to check its robustness.

QA

What issues can Okareo monitor?
It detects problems such as hallucinations, inaccurate answers, delays, etc. and alerts you in real time during production.
What language models are supported?
Support for OpenAI, custom models, etc., as long as they can be accessed via API.
Difference between free and paid version?
The free version is suitable for small scale testing, the paid version unlocks more data generation and monitoring features.