Petals: distributed shared GPU running and fine-tuning of large language models, sharing GPU resources like a BitTorrent network

Latest AI Resources9mos agoupdate AI Sharing Circle

2.8K 00

General Introduction

Petals is an open source project developed by the BigScience Workshop to run Large Language Models (LLMs) through a distributed computing approach. Users can run and fine-tune LLMs such as Llama 3.1, Mixtral, Falcon, and BLOOM at home using consumer-grade GPUs or Google Colab.Petals uses a BitTorrent-like approach to distribute different parts of the model across multiple users' devices, enabling efficient inference and fine-tuning.

Petals：分布式共享GPU运行和微调大语言模型，像BitTorrent网络一样共享GPU资源

Function List

Running Large Language Models: Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) and BLOOM (176B) models are supported.
distributed inference: Run the model through a distributed network with single-batch inference speeds of up to 6 tokens/sec (Llama 2 70B) and 4 tokens/sec (Falcon 180B).
Quick fine-tuning: Supports rapid user fine-tuning of models for a variety of tasks.
community-driven: Relying on a community of users to share GPU resources, users can contribute their own GPUs to increase the computational power of Petals.
Flexible API: Provides a flexible API similar to PyTorch and Transformers, with support for customizing paths and viewing hidden state.
Privacy: Data processing takes place over a public network and users can set up private networks to protect sensitive data.

Using Help

Installation and use

Installation of dependencies::

Linux + Anaconda::

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

Windows + WSL: Please refer to WikiThe

Docker::

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
learningathome/petals:main \
python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct

macOS + Apple M1/M2 GPUs::

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

operational model::

Select any of the available models, for example:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM
model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))

Contribution GPU::
- Users can increase the computational power of Petals by connecting a GPU. Model HubThe

Main function operation flow

Select Model: Access Petals website, select the desired model.
Loading Models: Load and run the model according to the installation steps above.
fine-tuned model: Use the API provided by Petals to fine-tune the model for a variety of tasks.
Generate Text: Generate text over a distributed network for chatbots and interactive applications.