General Introduction
Exo is an open source project designed to run its own AI cluster using everyday devices (e.g. iPhone, iPad, Android, Mac, Linux, etc.). Through dynamic model partitioning and automated device discovery, Exo is able to unify multiple devices into a single powerful GPU that supports multiple models such as LLaMA, Mistral, LlaVA, Qwen, and Deepseek.Exo also provides a ChatGPT-compatible API that allows users to easily run models on their own hardware.
Function List
- Broad model support: Support for multiple models such as LLaMA, Mistral, LlaVA, Qwen and Deepseek.
- Dynamic model partitioning: Optimize model partitioning based on current network topology and device resources.
- Automated device discovery: Autodiscover other devices without manual configuration.
- ChatGPT Compatible API: Provide a ChatGPT-compatible API to facilitate running the model on your own hardware.
- equipment equality: The devices are connected to each other using point-to-point connections and do not use a master-slave architecture.
- Multiple partitioning strategies: Supports multiple partitioning strategies such as ring memory-weighted partitioning.
Using Help
Using Help
Installation process
- preliminary::
- Make sure Python version >= 3.12.0.
- If using Linux and supporting NVIDIA GPUs, install the NVIDIA drivers, CUDA toolkit, and cuDNN library.
- Installation from source::
- Cloning Project:
git clone https://github.com/exo-explore/exo.git
- Go to the project catalog:
cd exo
- Install the dependencies:
pip install -e .
- Or use a virtual environment to install it:
source install.sh
- Cloning Project:
Functional operation flow
- operational model::
- Run the example on multiple macOS devices:
- Equipment 1:
exo
- Equipment 2:
exo
- Exo automatically discovers other devices and launches a ChatGPT-like WebUI (powered by tinygrad tinychat) at
http://localhost:52415
The
- Equipment 1:
- Run the example on a single device:
- Use the command:
exo run llama-3.2-3b
- Use a customized prompt:
exo run llama-3.2-3b --prompt "What is the meaning of exo?"
- Use the command:
- Run the example on multiple macOS devices:
- Model Storage::
- By default, models are stored in the
~/.cache/huggingface/hub
The - This can be done by setting the environment variable
HF_HOME
to change the model storage location.
- By default, models are stored in the
- adjust components during testing::
- Using Environment Variables
DEBUG
(0-9) Enable debug logging:DEBUG=9 exo
- For the tinygrad inference engine, use a separate debugging flag
TINYGRAD_DEBUG
(1-6):TINYGRAD_DEBUG=2 exo
- Using Environment Variables
- Formatting Code::
- utilization
yapf
Formatting code:- Installation formatting requirements:
pip3 install -e '. [formatting]'
- Run the formatting script:
python3 format.py . /exo
- Installation formatting requirements:
- utilization
Usage
- Start EXO::
exo
EXO will automatically discover and connect to other devices without additional configuration.
- operational model::
- Use the default model:
exo run llama-3.2-3b
- Customization Tip:
exo run llama-3.2-3b --prompt "What is the meaning of EXO?"
- API Usage Examples::
- Send request:
bash
curl http://localhost:52415/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b".
"messages": [{"role": "user", "content": "What is the meaning of EXO?"}] ,
"temperature": 0.7
}'
- Send request:
performance optimization
- macOS users::
- Upgrade to the latest version of macOS.
- (of a computer) run
. /configure_mlx.sh
Optimize GPU memory allocation.
common problems
- SSL error: On some MacOS/Python versions, the certificate is not installed correctly. Run the following command to fix it:
/Applications/Python 3.x/Install Certificates.command
- Debug Log: Enable debug logging:
DEBUG=9 exo