General Introduction
No front-end , pure configuration file configuration API channel . Just write a file to run up an API station of their own , the document has a detailed configuration guide , white friendly .
uni-api is a project to unify the management of large model APIs, allowing multiple back-end services to be invoked through a single unified API interface and converted to OpenAI format with load balancing support. The project is particularly suitable for individual users who do not need a complex front-end interface, and supports a wide range of models and service providers, including OpenAI, Anthropic, Gemini, Vertex, and others.
Function List
- Unified API interface: Call multiple back-end services through a single unified API interface.
- format conversion: Convert APIs from different service providers to OpenAI format.
- load balancing: Supports a variety of load balancing strategies, including polling, weighted polling, and so on.
- Multi-service support: Support for OpenAI, Anthropic, Gemini, Vertex and many other service providers.
- Configuration file management: Manage API channels and models through configuration files.
- auto-retry: Automatically retry the next channel when an API request fails.
- privilege control: Supports fine-grained privilege control and flow-limiting settings.
Using Help
Lightweight Deployment
After clicking the One-Click Deploy button above, set the environment variables CONFIG_URL
for the direct chain of profiles. DISABLE_DATABASE
to true and click Create to create the project.
2.serv00 Remote Deployment
First, log in to the panel, click on the tab Run your own applications in Additional services to allow running your own applications, then go to the panel Port reservation to open a random port.
If you don't have your own domain name, go to the panel WWW websites and delete the default domain name, then create a new domain name Domain for the domain name you just deleted, click Advanced settings and set the Website type to Proxy domain name, the Proxy port pointing to the port you just opened, and don't check Use HTTPS.
ssh Login to the serv00 server and execute the following command:
git clone --depth 1 -b main --quiet https://github.com/yym68686/uni-api.git cd uni-api python -m venv uni-api tmux new -s uni-api source uni-api/bin/activate export CFLAGS="-I/usr/local/include" export CXXFLAGS="-I/usr/local/include" export CC=gcc export CXX=g++ export MAX_CONCURRENCY=1 export CPUCOUNT=1 export MAKEFLAGS="-j1" CMAKE_BUILD_PARALLEL_LEVEL=1 cpuset -l 0 pip install -vv -r requirements.txt cpuset -l 0 pip install -r -vv requirements.txt
ctrl+b d Exit tmux Wait a few hours for the installation to complete, and when it does, execute the following command:
tmux attach -t uni-api source uni-api/bin/activate export CONFIG_URL=http://file_url/api.yaml export DISABLE_DATABASE=true # Modify the port, xxx is the port, change it by yourself, it corresponds to the port you just opened in the panel Port reservation. sed -i '' 's/port=8000/port=xxx/' main.py sed -i '' 's/reload=True/reload=False/' main.py python main.py
Use ctrl+b d to exit tmux to allow the program to run in the background. At this point you can use uni-api in other chat clients. curl test script:
curl -X POST https://xxx.serv00.net/v1/chat/completions \ -H 'Content-Type: application/json' \\ -H 'Authorization: Bearer sk-xxx' \\ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hello"}]}'
3.Docker Local Deployment
Launch Container
docker run --user root -p 8001:8000 --name uni-api -dit \ -e CONFIG_URL=http://file_url/api.yaml \ # If you have already mounted the local configuration file, you don't need to set the CONFIG_URL. -v . /api.yaml:/home/api.yaml \ # If you have already set CONFIG_URL, you do not need to mount the configuration file -v . /uniapi_db:/home/data \ # If you don't want to save statistics, you don't need to mount the folder yym68686/uni-api:latest
Or, if you want to use Docker Compose, here's a docker-compose.yml example:
services. uni-api. container_name: uni-api image: yym68686/uni-api:latest environment: uni-api:latest - CONFIG_URL=http://file_url/api.yaml # If you have already mounted the local configuration file, you do not need to set the CONFIG_URL. ports. - 8001:8000 ports: 8001:8000 volumes. - . /api.yaml:/home/api.yaml # If you have already set CONFIG_URL, you do not need to mount the configuration file. - . /uniapi_db:/home/data # If you don't want to save statistics, you don't need to mount this folder
CONFIG_URL is the configuration file that can be downloaded automatically from a remote location. For example, if you don't have the convenience of modifying the configuration file on a certain platform, you can pass the configuration file to a certain hosting service, which can provide a direct link for uni-api to download, and CONFIG_URL is this direct link. If you use locally mounted configuration files, you don't need to set CONFIG_URL. CONFIG_URL is used when it is inconvenient to mount configuration files.
Running the Docker Compose container in the background
docker-compose pull docker-compose up -d
Docker Build
docker build --no-cache -t uni-api:latest -f Dockerfile --platform linux/amd64 . docker tag uni-api:latest yym68686/uni-api:latest docker push yym68686/uni-api:latest
Restarting a Docker image with a single click
set -eu docker pull yym68686/uni-api:latest docker rm -f uni-api docker run --user root -p 8001:8000 -dit --name uni-api \ -e CONFIG_URL=http://file_url/api.yaml \ -v . /api.yaml:/home/api.yaml \ -v . /uniapi_db:/home/data \ yym68686/uni-api:latest docker logs -f uni-api
RESTful curl test
curl -X POST http://127.0.0.1:8000/v1/chat/completions \ -H "Content-Type: application/json" \\ -H "Authorization: Bearer ${API}" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
Installation process
- Preparing the configuration file: Create a file named
api.yaml
configuration file, filling in the service provider information, API address, and API key. - Uploading configuration files: Upload the configuration file to a cloud drive for a direct link to the file.
- Starting a Docker Container::
- utilization
CONFIG_URL
The environment variable sets the URL of the configuration file. - set up
DISABLE_DATABASE
because oftrue
The - Start the container using the Docker command:
docker run -d --name uni-api -e CONFIG_URL=http://file_url/api.yaml -e DISABLE_DATABASE=true yym68686/uni-api:latest
- utilization
- Configure the port: Open a random port in the panel and point it to the Docker container.
Usage Process
- invoke an API: Calling back-end services using a unified API interface, supporting multiple models and service providers.
- load balancing: Automatically assigns requests to different channels based on the load balancing policy in the configuration file.
- privilege control: Controls the scope of API key usage and flow limiting based on the permission settings in the configuration file.
- auto-retry: Ensure high availability by automatically retrying the next available channel when a request fails.
Detailed steps
- Configuration File Example::
provider: provider_name - provider: provider_name # Service provider name, such as openai, anthropic, gemini, openrouter, deepbricks, whatever, required base_url: https://api.your.com/v1/chat/completions # API address of the backend service, required. api: sk-YgS6GTi0b4bEabc4C # Provider's API key, required model: # Optional, if no model is configured, all available models will be automatically fetched via base_url and api via /v1/models endpoint. - gpt-4o # The name of the model that can be used, required. - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename the model, claude-3-5-sonnet-20240620 is the model name of the service provider, claude-3-5-sonnet is the renamed name, you can use a simple name instead of the original complex name. original complex name, optional - dall-e-3 - provider. anthropic base_url: https://api.anthropic.com/v1/messages api: # supports multiple API keys, multiple keys automatically enable load balancing of training rounds, at least one key, required - sk-ant-api03-bNnAOJyA-xQw_twAA - sk-ant-api02-bNnxxxx model: claude-3-5-sonnet - claude-3-5-sonnet-20240620: claude-3-5-sonnet # Rename the model, claude-3-5-sonnet-20240620 is the name of the service provider's model, and claude-3-5-sonnet is the renamed name, which can be replaced with a concise name. original complex name, optional tools: true # whether to support tools, such as generating code, generating documents, etc., the default is true, optional. - provider. gemini base_url: https://generativelanguage.googleapis.com/v1beta # base_url support v1beta/v1, for Gemini models only, required api: AIzaSyAN2k6IRdgw api: AIzaSyAN2k6IRdgw - gemini-1.5-pro - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 can not be used, if you want to use the original name, you can add the original name to the model by adding the following line You can use the original name by adding the following line - gemini-1.5-flash-exp-0827 # With this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested. tools: true - provider: vertex project_id: gen-lang-client-xxxxxxxxxxxxxxxxx # Description: Your Google Cloud project ID. format: string, usually consisting of lowercase letters, numbers and hyphens. How to get it: Your project ID can be found in the project selector in the Google Cloud Console. private_key: "-----BEGIN PRIVATE KEY-----\nxxxxx\n-----END PRIVATE" # DESCRIPTION: The private key for the Google Cloud Vertex AI service account. Format: A JSON-formatted string containing the private key information for the service account. How to get it: Create a service account in Google Cloud Console, generate a key file in JSON format, and set its contents to the value of this environment variable. client_email: xxxxxxxxxx@xxxxxxx.gserviceaccount.com # Description: The email address of the Google Cloud Vertex AI service account. Format: Typically a string like "service-account-name@project-id.iam.gserviceaccount.com". How to get it: Generated when creating the service account, or you can get it by viewing the service account details in the Google Cloud Console's "IAM & Management" section. model. - gemini-1.5-pro - gemini-1.5-flash - claude-3-5-sonnet@20240620: claude-3-5-sonnet - claude-3-opus@20240229: claude-3-opus - claude-3-sonnet@20240229: claude-3-sonnet - claude-3-haiku@20240307: claude-3-haiku tools: true notes: https://xxxxx.com/ # can put the service provider's URL, notes, official documentation, optional - provider: cloudflare api: f42b3xxxxxxxxxxxxq4aoGAh # Cloudflare API Key, required cf_account_id: 8ec0xxxxxxxxxxxxxxxxxxe721 # Cloudflare Account ID, required model. - '@cf/meta/llama-3.1-8b-instruct': llama-3.1-8b # Renamed model, @cf/meta/llama-3.1-8b-instruct is the original model name of the service provider, you must wrap the model name in quotes or else the yaml syntax error, llama- 3.1-8b is the renamed name, you can use a concise name instead of the original complex name, optional - '@cf/meta/llama-3.1-8b-instruct' # must use quotes to wrap model name, otherwise yaml syntax error - provider: other-provider base_url: https://api.xxx.com/v1/messages api: sk-bNnAOJyA-xQw_twAA model: sk-bNNAOJyA-xQw_twAA - causallm-35b-beta2ep-q6k: causallm-35b - anthropic/claude-3-5-sonnet tools: false tools: false openrouter # force to use a certain message format, currently supports gpt, claude, gemini, openrouter native format, optional api_keys: api: sk-KjjI60Yf0JFWW - api: sk-KjjI60Yf0JFWxfgRmXqFWyGtWUd9GZnmi3KlvowmRWpWpQRo # API Key, the user needs an API key to use this service, required. model: # The model that this API Key can be used, required. By default, channel-level polling load balancing is enabled, and the models are requested in the order configured by the model for each request. It has nothing to do with the original channel order in the providers. So you can set each API key to be requested in a different order. - Model names that can be used by gpt-4o #. You can use all gpt-4o models provided by the provider - claude-3-5-sonnet # Model name that can be used, can use the claude-3-5-sonnet model from all providers. - gemini/* Model names that can be used by #, only all models provided by a provider named gemini can be used, where gemini is the name of the provider and * represents all models role: admin - api: sk-pkhf60Yf0JGyJxgRmXqFQyTgWUd9GZnmi3KlvowmRWpWqrhy model. - anthropic/claude-3-5-sonnet # The name of the model that can be used, only claude-3-5-sonnet models provided by providers named anthropic can be used. claude-3-5-sonnet models from other providers may not be used. This writeup will not match a model named anthropic/claude-3-5-sonnet provided by another-provider. - # By placing pointed brackets on both sides of the model name, this does not go to the channel named anthropic to find the claude-3-5-sonnet model, but instead uses the entire anthropic/claude-3-5-sonnet as the model name. This writeup will match to a model named anthropic/claude-3-5-sonnet provided by another-provider. It will not match the claude-3-5-sonnet model under anthropic. - openai-test/text-moderation-latest # When message moral censorship is turned on, the text-moderation-latest model under the channel named openai-test can be used for moral censorship. preferences. SCHEDULING_ALGORITHM: fixed_priority # When SCHEDULING_ALGORITHM is fixed_priority, use fixed priority scheduling to always execute the first channel that has the requested model. Enabled by default, SCHEDULING_ALGORITHM defaults to fixed_priority. SCHEDULING_ALGORITHM optional values are: fixed_priority, round_robin, weighted_round_robin, lottery, and random. # When SCHEDULING_ALGORITHM is random, random round robin load balancing is used to randomly request the channel that owns the requested model. # When SCHEDULING_ALGORITHM is round_robin, use round-training load balancing to request the channels of the user's model in order. AUTO_RETRY: true # Whether to auto-retry, automatically retry the next provider, true for auto-retry, false for no auto-retry, default is true RATE_LIMIT: 2/min # supports limiting the flow, the maximum number of requests per minute, can be set to an integer, such as 2/min, 2 times per minute, 5/hour, 5 times per hour, 10/day, 10 times per day, 10/month, 10 times per month, 10/year, 10 times per year. Default 60/min, optional ENABLE_MODERATION: true # Whether to enable message moral review, true is enabled, false is not enabled, the default is false, when enabled, the user's messages will be reviewed for morality, and if inappropriate messages are found, an error message will be returned. # Channel-Level Weighted Load Balancing Configuration Example - api: sk-KjjI60Yd0JFWtxxxxxxxxxxxxxxxxxxxwmRWpWpQRo model: gcp1/*: 5 - gcp1/*: 5 # The weights come after the colon and only positive integers are supported. - gcp2/*: 3 # The size of the number represents the weight, the larger the number, the higher the probability of the request. - gcp3/*: 2 # In this example, there are 10 weights for all channels combined, and out of the 10 requests, 5 requests will request the gcp1/* model, 2 requests will request the gcp2/* model, and 3 requests will request the gcp3/* model. preferences. SCHEDULING_ALGORITHM: weighted_round_robin # Only if SCHEDULING_ALGORITHM is weighted_round_robin and the above channels, if weighted, are requested in the weighted order. Use weighted round robin load balancing to request the channels that have the requested model in weighted order. When SCHEDULING_ALGORITHM is lottery, use lottery round robin load balancing to randomly request channels that own the requested model according to their weights. Channels that do not have a weight set automatically fall back to round_robin round robin load balancing. AUTO_RETRY: true preferences: # global configuration model_timeout: # model timeout in seconds, default 100 seconds, optional gpt-4o: 10 The timeout for # model gpt-4o is 10 seconds, gpt-4o is the model name, when requesting a model such as gpt-4o-2024-08-06, the timeout is also 10 seconds. claude-3-5-sonnet: 10 # model claude-3-5-sonnet timeout is 10 seconds, when requesting models such as claude-3-5-sonnet-20240620, the timeout time is also 10 seconds. default: 10 # model does not have a timeout set, the default timeout is 10 seconds, when requesting a model that is not in model_timeout, the default timeout is 10 seconds, if default is not set, uni-api will use the default timeout set by the environment variable TIMEOUT, the default timeout is 100 seconds. o1-mini: 30 # The timeout time for model o1-mini is 30 seconds, when requesting a model whose name starts with o1-mini, the timeout time is 30 seconds. o1-preview: 100 # The timeout for model o1-preview is 100 seconds when requesting a model with a name beginning with o1-preview. cooldown_period: 300 # Channel cooldown time in seconds, default 300 seconds, optional. When a model request fails, the channel will be automatically excluded from cooling for a period of time, and the channel will not be requested again. After the cooling time is over, the model will be automatically restored until the request fails again, and it will be cooled down again. When cooldown_period is set to 0, the cooling mechanism is not enabled.
- Launch Container::
docker run -d --name uni-api -e CONFIG_URL=http://file_url/api.yaml -e DISABLE_DATABASE=true yym68686/uni-api:latest
- invoke an API::
curl -X POST https://api.your.com/v1/chat/completions -H "Authorization: Bearer sk-Pkj60Yf8JFWxfgRmXQFWyGtWUddGZnmi3KlvowmRWpWpQxx" -d '{" model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'