AI Personal Learning
and practical guidance
讯飞绘镜

Yek: reading git repository text files and quickly chunking them for use in large models

General Introduction

Yek is a fast Rust-based tool for reading text files from a repository or directory, chunking them, and serializing them for use in large language models (LLMs). The tool uses .gitignore rules by default to skip unwanted files and uses Git history to infer important files. yek can chunk content based on approximate "token" counts or byte sizes, and automatically detects if the output is pipelined. It supports processing multiple directories in a single command, and is configured via the yek.toml file.

Yek:读取git仓库文本文件并快速分块,以供大模型使用-1


 

Function List

  • Using the .gitignore rule to skip unwanted files
  • Using Git History to Infer Important Files
  • Inferring additional ignore patterns (e.g., binary files, large files, etc.)
  • Chunking content based on approximate "token" count or byte size
  • Automatically detects if the output is piped
  • Support for handling multiple directories in a single command
  • Configuration via yek.toml file

 

Using Help

Installation process

Unix-like systems (macOS, Linux)

curl -fsSL https://bodo.run/yek.sh | bash

Windows (PowerShell)

irm https://bodo.run/yek.ps1 | iex

Build from source

git clone https://github.com/bodo-run/yek.git
cd yek
cargo build --release

Usage

Yek has reasonable default settings, and you can simply run yek in a directory to serialize the entire repository. By default, it will serialize all files in the repository into 10MB chunks and write the files to a temporary directory, with the paths to the files printed to the console.

typical example

  • Processes the current directory and writes to the temporary directory:
yek
  • Pipeline the output to the clipboard (macOS):
yek src/ | pbcopy
  • Limit the maximum size to 128K tokens and process only the src directory:
yek --max-size 128K --tokens src/
  • Limit the maximum size to 100KB and process only the src directory, write to a specific directory:
yek --max-size 100KB --output-dir /tmp/yek src/
  • Handles multiple catalogs:
yek src/ tests/

CLI Reference

yek --help

Yek is a repository content chunking and serialization tool for LLM consumption.

usage

yek [OPTIONS] [directories]...

parameters

  • directories: Directory to be processed [default: .]

options (as in computer software settings)

  • --max-size <max-size>: Maximum size of each block (e.g. '10MB', '128KB', '1GB') [default: 10MB]

One sentence description (brief)

 

May not be reproduced without permission:Chief AI Sharing Circle " Yek: reading git repository text files and quickly chunking them for use in large models
en_USEnglish