Xinference: Easy Distributed AI Model Deployment and Serving

1.2K 00

General Introduction

Xorbits Inference (Xinference for short) is a powerful and versatile library focused on providing distributed deployment and serving of language models, speech recognition models, and multimodal models. With Xorbits Inference, users can easily deploy and serve their own models or built-in advanced models with a single command. Whether in the cloud, on a local server, or on a personal computer, Xorbits Inference runs efficiently. The library is especially suited for researchers, developers, and data scientists to help them realize the full potential of cutting-edge AI models.

Function List

distributed deployment: Supports distributed deployment scenarios, allowing model inference tasks to be seamlessly distributed across multiple devices or machines.
modeling service: Streamlining the process of serving large language models, speech recognition models, and multimodal models.
Single-command deployment: Deploy and serve models with a single command, for both experimental and production environments.
Heterogeneous hardware utilization: Intelligence utilizes heterogeneous hardware, including GPUs and CPUs, to accelerate model inference tasks.
Flexible APIs and interfaces: Provide a variety of interfaces to interact with the model, supporting RPC, RESTful API (compatible with OpenAI API), CLI and WebUI.
Built-in advanced models: Built-in support for a wide range of cutting-edge open-source models, which users can use directly to conduct experiments.

Using Help

Installation process

environmental preparation: Ensure that Python 3.7 or later is installed.
Installation of Xorbits Inference::

   pip install xorbits-inference

Verify Installation: After the installation is complete, you can verify that the installation was successful by using the following command:

   xinference --version

Guidelines for use

Deployment models

Loading Models: Use the following command to load a pre-trained model:

   xinference load-model --model-name <模型名称>

Example:

   xinference load-model --model-name gpt-3

Starting services: After loading the model, start the service:

   xinference serve --model-name <模型名称>

Example:

   xinference serve --model-name gpt-3

invoke an API: Once the service is started, it can be invoked via a RESTful API:

   curl -X POST http://localhost:8000/predict -d '{"input": "你好"}'

Using the built-in model

Xorbits Inference has built-in support for a wide range of advanced models that can be used directly by the user to conduct experiments. Example:

language model: e.g. GPT-3, BERT, etc.
speech recognition model: e.g. DeepSpeech, etc.
multimodal model: e.g. CLIP, etc.

distributed deployment

Xorbits Inference supports distributed deployment, allowing users to seamlessly distribute model inference tasks across multiple devices or machines. The steps are described below:

Configuring a Distributed Environment: Install Xorbits Inference on each node and configure the network connection.
Starting Distributed Services: Start distributed services on the master node:

   xinference serve --distributed --nodes <节点列表>

Example:

   xinference serve --distributed --nodes "node1,node2,node3"

Calling the Distributed API: Similar to single-node deployments, it is invoked via a RESTful API:

   curl -X POST http://<主节点IP>:8000/predict -d '{"input": "你好"}'

common problems

How do I update the model? Use the following command to update the model:

  xinference update-model --model-name <模型名称>

How do I view the logs? Use the following command to view the service log:

  xinference logs --model-name <模型名称>

AI News # Locally Deployed Open Source Large Modeling Tool

The article is copyrighted and should not be reproduced without permission.

全球首个多语言 ColBERT：Jina ColBERT V2 和它的‘俄罗斯套娃’技术

The World's First Multilingual ColBERT: Jina ColBERT V2 and its 'Russian nesting doll' technology

AI News

11mos ago

01.6K

Open source 1.6B miniatures "Little Fox", outperforms similar models Qwen and Gemma

AI News

8mos ago

01.2K

AI Dev Gallery：Windows本地AI模型开发工具集，端侧模型集成到Windows应用

AI Dev Gallery: Windows Native AI Model Development Toolset, End-Side Model Integration into Windows Applications

Latest AI Resources # AI Java Open Source Projecct # Locally Deployed Open Source Large Modeling Tool

7mos ago

01.7K

What AI model does ConchAsk use?

AI News

1yrs ago

01.8K

No comments

You must be logged in to leave a comment!

No comments...

Xinference: Easy Distributed AI Model Deployment and Serving

General Introduction

Function List

Using Help

Installation process

Guidelines for use

Deployment models

Using the built-in model

distributed deployment

common problems

Leaked Microsoft paper: only 8B for GPT-4o-mini and 100B for o1-mini?

SiliconCloud x FastGPT: Enabling 200,000 Users to Build an Exclusive AI Knowledge Base

Related posts

The World's First Multilingual ColBERT: Jina ColBERT V2 and its 'Russian nesting doll' technology

Open source 1.6B miniatures "Little Fox", outperforms similar models Qwen and Gemma

AI Dev Gallery: Windows Native AI Model Development Toolset, End-Side Model Integration into Windows Applications

What AI model does ConchAsk use?

No comments

Latest Collections

Latest Articles

Xinference: Easy Distributed AI Model Deployment and Serving

General Introduction

Function List

Using Help

Installation process

Guidelines for use

Deployment models

Using the built-in model

distributed deployment

common problems

Leaked Microsoft paper: only 8B for GPT-4o-mini and 100B for o1-mini?

SiliconCloud x FastGPT: Enabling 200,000 Users to Build an Exclusive AI Knowledge Base

Related posts

The World's First Multilingual ColBERT: Jina ColBERT V2 and its 'Russian nesting doll' technology

Open source 1.6B miniatures "Little Fox", outperforms similar models Qwen and Gemma

AI Dev Gallery: Windows Native AI Model Development Toolset, End-Side Model Integration into Windows Applications

What AI model does ConchAsk use?

No comments

Selected AI Tools

Latest Collections

Latest Articles