AI Personal Learning
and practical guidance

GPUStack: Managing GPU clusters to run large language models and quickly integrate common inference services for LLMs.

General Introduction

GPUStack is an open source GPU cluster management tool designed for running Large Language Models (LLMs). It supports a wide range of hardware, including Apple MacBooks, Windows PCs, and Linux servers, making it easy to scale the number of GPUs and nodes to meet growing computing demands. GPUStack provides distributed reasoning with single-node, multi-GPU, and multi-node reasoning and services, compatibility with the OpenAI API, and simplified user and API key management, as well as real-time monitoring of GPU performance and utilization. It is compatible with the OpenAI API, simplifies user and API key management, and monitors GPU performance and utilization in real-time. Its lightweight Python package design ensures minimal dependencies and operational overhead, making it ideal for developers and researchers.

GPUStack: managing GPU clusters running large language models for fast integration of common inference services for LLMs. -1


 

Function List

  • Supports a wide range of hardware: Compatible with Apple Metal, NVIDIA CUDA, Ascend CANN, Moore Threads MUSA and more.
  • Distributed inference: supports single-node multi-GPU and multi-node inference and services.
  • Multiple inference backends: support for llama-box (llama.cpp) and vLLM.
  • Lightweight Python packages: minimal dependencies and operational overhead.
  • OpenAI Compatible API: Provides API services that are compatible with the OpenAI standard.
  • User and API Key Management: Simplifies user and API key management.
  • GPU Performance Monitoring: Monitor GPU performance and utilization in real time.
  • Token usage and rate monitoring: Efficiently manage token usage and rate limiting.

 

Using Help

Installation process

Linux or MacOS

  1. Open the terminal.
  2. Run the following command to install GPUStack:
   curl -sfL https://get.gpustack.ai | sh -s -
  1. After installation, GPUStack will run as a service on the systemd or launchd system.

Windows (computer)

  1. Run PowerShell as an administrator (avoid using PowerShell ISE).
  2. Run the following command to install GPUStack:
   Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

Guidelines for use

initial setup

  1. Accessing the GPUStack UI: Open in browser http://myserverThe
  2. Use the default username admin and the initial password to log in. Method to get the initial password:
    • Linux or MacOS: run cat /var/lib/gpustack/initial_admin_passwordThe
    • Windows: running Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -RawThe

Creating API Keys

  1. After logging into the GPUStack UI, click on "API Keys" in the navigation menu.
  2. Click the "New API Key" button, fill in the name and save it.
  3. Copy the generated API key and save it properly (visible only at creation time).

Using the API

  1. Setting environment variables:
   export GPUSTACK_API_KEY=myapikey
  1. Use curl to access OpenAI-compatible APIs:
   curl http://myserver/v1-openai/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GPUSTACK_API_KEY" \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "system", "content": "You are a helpful assistant."}, { "role": "user".
{"role": "user", "content": "Hello!"}
}, {"role": "user", "content": "Hello!
"stream": true
}'

Run and Chat

  1. Run the following command in a terminal to chat with the llama3.2 model:
   gpustack chat llama3.2 "tell me a joke."
  1. Click on "Playground" in GPUStack UI to interact.

Monitoring and Management

  1. Monitor GPU performance and utilization in real time.
  2. Manage user and API keys, track token usage and rates.

Supported models and platforms

  • Supported models: LLaMA, Mistral 7B, Mixtral MoE, Falcon, Baichuan, Yi, Deepseek, Qwen, Phi, Grok-1, and others.
  • Supported multimodal models: Llama3.2-Vision, Pixtral, Qwen2-VL, LLaVA, InternVL2, and others.
  • Supported platforms: macOS, Linux, Windows.
  • Supported gas pedals: Apple Metal, NVIDIA CUDA, Ascend CANN, Moore Threads MUSA, with future plans to support AMD ROCm, Intel oneAPI, Qualcomm AI Engine.

Documentation and Community

May not be reproduced without permission:Chief AI Sharing Circle " GPUStack: Managing GPU clusters to run large language models and quickly integrate common inference services for LLMs.

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish