General Introduction
ChainForge is an open source visual programming environment designed for testing and evaluating the effectiveness of prompts for large language models (LLMs). It provides a data-flow cueing engineering environment through which users can quickly explore and analyze the impact of different cues on the response quality of LLMs.ChainForge supports a wide range of model providers, including OpenAI, HuggingFace, Anthropic, etc., allowing users to compare and evaluate multiple models in a single interface. The tool is particularly well suited for early-stage cue exploration and rapid iteration, helping users optimize cue and model settings for best response quality.
Function List
- Multi-model query: Query multiple LLMs at the same time to quickly test hint ideas and variants.
- Comparison of response quality: Comparing response quality across cues, models, and model settings.
- Visual assessment: Set up evaluation metrics and instantly visualize the results of prompts, parameters, models and settings.
- many rounds of dialogue: Conduct multiple rounds of dialog between template parameters and chat models, examining and evaluating the output of each dialog round.
- Templated Tips: Not only can you template prompts, but you can also template follow-up chat messages.
- Example Evaluation Streams: Provide multiple example assessment streams to demonstrate possible usage scenarios.
- Local and online installation: Supports local installations and online trials, providing flexibility of use.
- Multiple model support: Support for OpenAI, HuggingFace, Anthropic, Google PaLM2, Azure OpenAI, and many other model providers.
Using Help
Installation process
local installation
- Make sure Python 3.8 or later is installed.
- Run the following command to install ChainForge:
pip install chainforge
- After the installation is complete, run the following command to start the ChainForge server:
chainforge serve
- Open your browser and visit
localhost:8000
You can start using ChainForge now.
Installing with Docker
- Build the Docker image:
docker build -t chainforge .
- Run the Docker container:
docker run -p 8000:8000 chainforge
- Open your browser and visit
127.0.0.1:8000
You can start using ChainForge now.
Guidelines for use
- Setting the API Key: Click the Settings icon in the upper right corner and enter the API key for OpenAI, Anthropic, Google PaLM, etc.
- Create a new projectClick on the "New Project" button and select the desired model and prompt template.
- Add tips and models: Add cue templates and models to the project and set different parameters for testing.
- Operational assessment: By clicking the "Run" button, ChainForge will automatically query all selected models and display the response results.
- Comparison and visualization: Use visualization tools to compare the response quality of different cues and models and select the best cue and model settings.
- Save and Share: Once the project is completed, you can save the assessment results and generate a share link to share with others.
Example Evaluation Streams
ChainForge provides several sample evaluation flows to help users get started quickly. For example, you can use the "Response Length Comparison" example to compare the response lengths of different models with the same cue. You can also create custom evaluation flows with specific evaluation metrics and visualizations.
Advanced Features
- Customized evaluation nodes: Users can write Python code to customize evaluation nodes for more complex response evaluation.
- Multi-round dialogue assessment: Multiple rounds of dialog evaluation are supported, allowing users to test the quality of responses in different dialog rounds.
- Data export: The results of the assessment can be exported to an Excel table for further analysis.
ChainForge is a powerful tool for researchers, developers, and data scientists to help them optimize cue and model settings and improve the quality of LLM responses.