AI Personal Learning
and practical guidance

Yek: reading git repository text files and quickly chunking them for use in large models

General Introduction

Yek is a fast Rust-based tool for reading text files from a repository or directory, chunking them, and serializing them for use in large language models (LLMs). The tool uses .gitignore rules by default to skip unwanted files and uses Git history to infer important files. yek can chunk content based on approximate "token" counts or byte sizes, and automatically detects if the output is pipelined. It supports processing multiple directories in a single command, and is configured via the yek.toml file.

Yek: reading git repository text files and quickly chunking them for use in large models-1


 

Function List

  • Using the .gitignore rule to skip unwanted files
  • Using Git History to Infer Important Files
  • Inferring additional ignore patterns (e.g., binary files, large files, etc.)
  • Chunking content based on approximate "token" count or byte size
  • Automatically detects if the output is piped
  • Support for handling multiple directories in a single command
  • Configuration via yek.toml file

 

Using Help

Installation process

Unix-like systems (macOS, Linux)

curl -fsSL https://bodo.run/yek.sh | bash

Windows (PowerShell)

irm https://bodo.run/yek.ps1 | iex

Build from source

git clone https://github.com/bodo-run/yek.git
cd yek
cargo build --release

Usage

Yek has reasonable default settings, and you can simply run yek in a directory to serialize the entire repository. By default, it will serialize all files in the repository into 10MB chunks and write the files to a temporary directory, with the paths to the files printed to the console.

typical example

  • Processes the current directory and writes to the temporary directory:
yek
  • Pipeline the output to the clipboard (macOS):
yek src/ | pbcopy
  • Limit the maximum size to 128K tokens and process only the src directory:
yek --max-size 128K --tokens src/
  • Limit the maximum size to 100KB and process only the src directory, write to a specific directory:
yek --max-size 100KB --output-dir /tmp/yek src/
  • Handles multiple catalogs:
yek src/ tests/

CLI Reference

yek --help

Yek is a repository content chunking and serialization tool for LLM consumption.

usage

yek [OPTIONS] [directories]...

parameters

  • directories: Directory to be processed [default: .]

options (as in computer software settings)

  • --max-size: Maximum size of each block (e.g. '10MB', '128KB', '1GB') [default: 10MB]

One sentence description (brief)

 

May not be reproduced without permission:Chief AI Sharing Circle " Yek: reading git repository text files and quickly chunking them for use in large models

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish