Large Language Modeling Engineer's Toolkit: A Selection of 120+ Useful Library Resources

1.2K 00

In the field of artificial intelligence, large-scale language modeling (LLM) technology is changing rapidly, and various tool libraries are emerging. In order to help developers better meet the challenges of LLM development, this paper organizes a toolbox containing more than 120 useful LLM libraries, and divides them by functional categories, which is convenient for engineers to quickly find and apply.

Quick navigation

To make it easier for readers to quickly locate the resources they need, here are quick links to the tool libraries in each category:


🚀 LLM training and fine-tuning	🧱 LLM Application Development	🩸 LLM retrieval enhancement generation (RAG)
🟩 LLM reasoning	🚧 LLM Service Deployment	📤 LLM data extraction
🌠 LLM data generation	💎 LLM Intelligent Body	⚖️ LLM Assessment
🔍 LLM monitoring and control	📅 LLM Prompt Word Engineering	📝 LLM Structured Output
🛑 LLM Safety	💠 LLM Embedding Models	❇️ Other

LLM Training and Fine Tuning

library name	descriptive	link (on a website)
unsloth	Fine-tune LLM faster with less memory.	link (on a website)
PEFT	Advanced library for efficient fine-tuning of parameters.	link (on a website)
TRL	Training using reinforcement learning transformer Language Modeling.	link (on a website)
Transformers	Transformers provides thousands of pre-trained models for performing tasks in different modalities such as text, vision and audio.	link (on a website)
Axolotl	Tools designed to simplify post-training of various AI models.	link (on a website)
LLMBox	A comprehensive LLM library, including a unified training pipeline and comprehensive model evaluation.	link (on a website)
LitGPT	Quickly train and fine-tune the LLM.	link (on a website)
Mergoo	A library for easily merging multiple LLM experts and efficiently training the merged LLM.	link (on a website)
Llama-Factory	Simple and efficient LLM fine-tuning tool.	link (on a website)
Ludwig	Low-code framework for building custom LLMs, neural networks, and other AI models.	link (on a website)
Txtinstruct	A framework for training instruction tuning models.	link (on a website)
Lamini	An integrated LLM inference and tuning platform.	link (on a website)
XTuring	XTuring provides fast, efficient and easy fine-tuning of open source LLMs such as Mistral, LLaMA, GPT-J and others.	link (on a website)
RL4LMs	A modular RL library for fine-tuning language models to human preferences.	link (on a website)
DeepSpeed	DeepSpeed is a deep learning optimization library that makes distributed training and inference simple, efficient and effective.	link (on a website)
torchtune	A PyTorch native library designed specifically for fine-tuning LLM.	link (on a website)
PyTorch Lightning	A library that provides a high-level interface for pre-training and fine-tuning LLMs.	link (on a website)

LLM Application Development

organizing plan

library name	descriptive	link (on a website)
LangChain	LangChain is a framework for developing applications driven by the Large Language Model (LLM).	link (on a website)
Llama Index	LlamaIndex is the data framework for LLM applications.	link (on a website)
HayStack	Haystack is an end-to-end LLM framework that allows users to build applications driven by LLM, Transformer models, vector search, and more.	link (on a website)
Prompt flow	A set of development tools designed to simplify the end-to-end development cycle of LLM-based AI applications.	link (on a website)
Griptape	A modular Python framework for building AI-driven applications.	link (on a website)
Weave	Weave is a toolkit for developing generative AI applications.	link (on a website)
Llama Stack	Build the Llama app.	link (on a website)

Multiple API Access

library name	descriptive	link (on a website)
LiteLLM	A library of over 100 LLM API calls in OpenAI format.	link (on a website)
AI Gateway	A fast AI gateway with integrated fencing. Routes to 200+ LLMs, 50+ AI fences via 1 fast and friendly API.	link (on a website)

router (computing)

library name	descriptive	link (on a website)
RouteLLM	Framework for servicing and evaluating LLM routers - Saving LLM costs without compromising quality Direct replacement for OpenAI clients for routing simpler queries to cheaper models.	link (on a website)

memorization

library name	descriptive	link (on a website)
mem0	Memory layer for AI applications.	link (on a website)
Memoripy	An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.	link (on a website)

interfaces

library name	descriptive	link (on a website)
Streamlit	A faster way to build and share data applications.Streamlit lets users turn Python scripts into interactive web applications in minutes.	link (on a website)
Gradio	Build and share delightful machine learning applications all in Python.	link (on a website)
AI SDK UI	Building chat and generative user interfaces.	link (on a website)
AI-Gradio	Create AI applications supported by a variety of AI providers.	link (on a website)
Simpleaichat	Python package for easily interacting with chat applications with powerful features and minimal code complexity.	link (on a website)
Chainlit	Build production-ready conversational AI apps in minutes.	link (on a website)

low code

library name	descriptive	link (on a website)
LangFlow	LangFlow is a low-code application builder for RAG and multi-agent AI applications. It is based on Python and is not related to any model, API or database.	link (on a website)

(computing) cache

library name	descriptive	link (on a website)
GPTCache	A library for creating semantic caches for LLM queries. Reduces the cost of the LLM API by 10x💰 and increases speed by 100x. Fully integrated with LangChain and LlamaIndex.	link (on a website)

LLM RAG

library name	descriptive	link (on a website)
FastGraph RAG	The streamlined and promptable Fast GraphRAG framework is designed for interpretable, highly accurate, agent-driven retrieval workflows.	link (on a website)
Chonkie	RAG chunking library, lightweight, extremely fast and easy to use.	link (on a website)
RAGChecker	A fine-grained framework for diagnosing RAG.	link (on a website)
RAG to Riches	Build, extend, and deploy advanced search-enhanced generation applications.	link (on a website)
BeyondLLM	Beyond LLM provides an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval Augmented Generation (RAG) systems.	link (on a website)
SQLite-Vec	A vector search SQLite extension that runs anywhere!	link (on a website)
fastRAG	fastRAG is a research framework for efficient and optimized retrieval of enhanced generation pipelines, combining advanced LLM and information retrieval techniques.	link (on a website)
FlashRAG	Python toolkit for efficient RAG research.	link (on a website)
Llmware	A unified framework for building enterprise RAG pipelines using small, specialized models.	link (on a website)
Rerankers	Lightweight unified API for various reordering models.	link (on a website)
Vectara	Build the Agentic RAG application.	link (on a website)

LLM reasoning

library name	descriptive	link (on a website)
LLM Compressor	Transformers-compatible library for applying various compression algorithms to LLM to optimize deployment.	link (on a website)
LightLLM	Python-based LLM inference and service framework known for its lightweight design, ease of scalability, and high-speed performance.	link (on a website)
vLLM	High throughput and memory efficient inference and service engine for LLM.	link (on a website)
torchchat	Run PyTorch LLM locally on servers, desktops, and mobile devices.	link (on a website)
TensorRT-LLM	TensorRT-LLM is a library for optimizing Large Language Model (LLM) inference.	link (on a website)
WebLLM	High-performance in-browser LLM inference engine.	link (on a website)

LLM service deployment

library name	descriptive	link (on a website)
Langcorn	Use FastAPI to automate the servicing of LangChain LLM applications and agents.	link (on a website)
LitServe	Extremely fast service engine for any AI model of any size. It enhances FastAPI with features such as batch processing, streaming, and GPU autoscaling.	link (on a website)

LLM Data Extraction

library name	descriptive	link (on a website)
Crawl4AI	Open source LLM friendly Web crawler and crawling tool .	link (on a website)
ScrapeGraphAI	A web crawling Python library that uses LLM and direct graph logic to create crawling pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).	link (on a website)
Docling	Docling parses documents and exports them easily and quickly to the desired format.	link (on a website)
Llama Parse	GenAI native document parser that can parse complex document data for any downstream LLM use case (RAG, agent).	link (on a website)
PyMuPDF4LLM	The PyMuPDF4LLM library makes it easier for users to extract PDF content in the formats required by LLM & RAG environments.	link (on a website)
Crawlee	A web crawler and browser automation library.	link (on a website)
MegaParse	Parser for each document type.	link (on a website)
ExtractThinker	Document Intelligence Library for LLM.	link (on a website)

LLM Data Generation

library name	descriptive	link (on a website)
DataDreamer	DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows.	link (on a website)
fabricator	A flexible open source framework for generating datasets using large language models.	link (on a website)
Promptwright	Synthetic dataset generation library.	link (on a website)
EasyInstruct	An easy-to-use framework for processing large language model instructions.	link (on a website)

LLM Intelligent Body

library name	descriptive	link (on a website)
CrewAI	A framework for orchestrating role-playing, autonomous AI agents.	link (on a website)
LangGraph	Construct the elastic language agent as a graph.	link (on a website)
Agno	Build AI agents with memory, knowledge, tools, and reasoning capabilities. Chat with them using a beautiful agent UI.	link (on a website)
AutoGen	An open source framework for building AI agent systems.	link (on a website)
Smolagents	Library for building powerful agents in a few lines of code.	link (on a website)
Pydantic AI	Python agent framework for building production-grade applications using generative AI.	link (on a website)
gradio-tools	A Python library for converting Gradio applications into tools that can be utilized by LLM-based agents to accomplish their tasks.	link (on a website)
Composio	Production-ready toolset for AI agents.	link (on a website)
Atomic Agents	Build AI agents atomically.	link (on a website)
Memary	An open source memory layer for autonomous agents.	link (on a website)
Browser Use	Make the site accessible to AI agents.	link (on a website)
OpenWebAgent	An open toolkit for enabling web proxies on large language models.	link (on a website)
Lagent	A lightweight framework for building LLM-based agents.	link (on a website)
LazyLLM	A low-code development tool for building multi-agent LLM applications.	link (on a website)
Swarms	An enterprise-class production-ready multi-agent orchestration framework.	link (on a website)
ChatArena	ChatArena is a library that provides a multi-agent language game environment and facilitates research on autonomous LLM agents and their social interactions.	link (on a website)
Swarm	Exploring an ergonomic, lightweight, multi-agent orchestrated educational framework.	link (on a website)
AgentStack	The fastest way to build powerful AI agents.	link (on a website)
Archgw	Intelligent Agent Gateway.	link (on a website)
Flow	A lightweight task engine for building AI agents.	link (on a website)
AgentOps	Python SDK for AI agent monitoring.	link (on a website)
Langroid	Multi-agent framework.	link (on a website)
Agentarium	A framework for creating and managing simulations that populate AI-driven agents.	link (on a website)
Upsonic	be in favor of MCP framework for reliable AI agents.	link (on a website)

LLM Assessment

library name	descriptive	link (on a website)
Ragas	Ragas is the ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications.	link (on a website)
Giskard	Open source evaluation and testing tools for ML & LLM systems.	link (on a website)
DeepEval	LLM Assessment Framework	link (on a website)
Lighteval	An all-in-one toolkit for evaluating LLMs.	link (on a website)
Trulens	Evaluation and tracking tools for LLM experiments	link (on a website)
PromptBench	A unified assessment framework for large-scale language models.	link (on a website)
LangTest	Delivering safe and valid language models. Over 60 test types for comparing LLM & NLP models in terms of accuracy, bias, fairness, robustness, and more.	link (on a website)
EvalPlus	Rigorous evaluation framework for LLM4Code.	link (on a website)
FastChat	An open platform for training, serving, and evaluating chatbots based on large language models.	link (on a website)
judges	A small pool of LLM judges.	link (on a website)
Evals	Evals is a framework for evaluating LLM and LLM systems, as well as benchmarking open source registries.	link (on a website)
AgentEvals	Evaluators and utilities for evaluating agent performance.	link (on a website)
LLMBox	A comprehensive LLM library, including a unified training pipeline and comprehensive model evaluation.	link (on a website)
Opik	An open source end-to-end LLM development platform that also includes LLM evaluation.	link (on a website)

LLM Monitoring

library name	descriptive	link (on a website)
MLflow	An open source end-to-end MLOps/LLMOps platform for tracking, evaluating and monitoring LLM applications.	link (on a website)
Opik	An open source end-to-end LLM development platform that also includes LLM monitoring.	link (on a website)
LangSmith	Provides tools for documenting, monitoring and improving LLM applications.	link (on a website)
Weights & Biases (W&B)	W&B provides features for tracking LLM performance.	link (on a website)
Helicone	Open source LLM observability platform for developers. One line integration for monitoring, metrics, evaluation, agent tracking, cue management, playgrounds and more.	link (on a website)
Evidently	An open source ML and LLM observability framework.	link (on a website)
Phoenix	An open source AI observability platform designed for experimentation, evaluation, and troubleshooting.	link (on a website)
Observers	A lightweight library for AI observability.	link (on a website)

LLM Cue word engineering

library name	descriptive	link (on a website)
PCToolkit	Unified plug-and-play hint compression toolkit for large language models.	link (on a website)
Selective Context	Selective Context compresses the user's prompts and context to allow the LLM (e.g. ChatGPT) to process 2x more content.	link (on a website)
LLMLingua	Library for compressing hints to accelerate LLM reasoning.	link (on a website)
betterprompt	A suite for testing LLM prompts before pushing them to the production environment.	link (on a website)
Promptify	Solve NLP problems with LLM and easily generate different NLP task prompts for popular generative models such as GPT, PaLM, etc. with Promptify.	link (on a website)
PromptSource	PromptSource is a toolkit for creating, sharing and using natural language prompts.	link (on a website)
DSPy	DSPy is an open source framework for programming (not prompting) language models.	link (on a website)
Py-priompt	Cue the design library.	link (on a website)
Promptimizer	Hints to optimize the library.	link (on a website)

LLM Structured Output

library name	descriptive	link (on a website)
Instructor	Python library for processing structured output from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API.	link (on a website)
XGrammar	An open source library for efficient, flexible and portable structure generation.	link (on a website)
Outlines	Powerful (structured) text generation	link (on a website)
Guidance	Guidance is a valid programming paradigm used to guide the language model.	link (on a website)
LMQL	A language for constraint bootstrapping and efficient LLM programming.	link (on a website)
Jsonformer	A foolproof method for generating structured JSON from language models.	link (on a website)

LLM Security

library name	descriptive	link (on a website)
JailbreakEval	A collection of automated evaluators for evaluating jailbreak attempts.	link (on a website)
EasyJailbreak	An easy-to-use Python framework for generating adversarial jailbreak hints.	link (on a website)
Guardrails	Adding guardrails to large language models.	link (on a website)
LLM Guard	A security toolkit for LLM interaction.	link (on a website)
AuditNLG	AuditNLG is an open source library that can help reduce the risks associated with using generative AI systems for language.	link (on a website)
NeMo Guardrails	NeMo Guardrails is an open source toolkit for easily adding programmable guardrails to LLM-based dialog systems.	link (on a website)
Garak	LLM Vulnerability Scanner	link (on a website)

LLM Embedding Model

library name	descriptive	link (on a website)
Sentence-Transformers	Advanced text embedding model	link (on a website)
Model2Vec	Fast advanced static embedding models	link (on a website)
Text Embedding Inference	High-speed inference solution for text embedding models.TEI implements high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE, and E5.	link (on a website)

(sth. or sb) else

library name	descriptive	link (on a website)
Text Machina	A modular and extensible Python framework designed to help create high-quality, unbiased datasets for building robust models for MGT-related tasks such as detection, attribution, and boundary detection.	link (on a website)
LLM Reasoners	A library for advanced large-scale language model reasoning.	link (on a website)
EasyEdit	An easy-to-use knowledge editing framework for large-scale language models.	link (on a website)
CodeTF	CodeTF: A one-stop Transformer library for advanced code LLM.	link (on a website)
spacy-llm	This package integrates a large-scale language model (LLM) into spaCy with a modular system for rapid prototyping and cueing, and transforms unstructured responses into robust outputs for a variety of NLP tasks.	link (on a website)
pandas-ai	Chat with the user's database (SQL, CSV, pandas, polars, MongoDB, NoSQL, etc.).	link (on a website)
LLM Transparency Tool	An open source interactive toolkit for analyzing the inner workings of Transformer-based language models.	link (on a website)
Vanna	Chat with your users' SQL databases. Accurate text-to-SQL generation using RAG's LLM.	link (on a website)
mergekit	Tools for merging pre-trained large language models.	link (on a website)
MarkLLM	An LLM watermarking open source toolkit.	link (on a website)
LLMSanitize	An open source library for contamination detection in NLP datasets and large language models (LLMs).	link (on a website)
Annotateai	Automatically annotate papers using LLM.	link (on a website)
LLM Reasoner	Let any LLM like OpenAI o1 and DeepSeek Think like R1.	link (on a website)