Rankify: a Python toolkit supporting information retrieval and reordering

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Rankify is an open source Python toolkit developed by the Data Science Group at the University of Innsbruck, Austria. It focuses on information retrieval, reordering and retrieval augmentation generation (RAG), providing a unified framework. The toolkit has 40 built-in pre-retrieval benchmark datasets, supports 7 retrieval techniques and 24 reordering models, and includes a variety of RAG Rankify is modular and easily extensible, suitable for researchers and developers for experimentation and benchmarking. The code is open and well-documented, and supports Python 3.10 and above.

Rankify: A Python Toolkit for Information Retrieval and Reordering-1

Function List

Seven retrieval techniques are available, including BM25, DPR, ColBERT, ANCE, BGE, Contriever and HYDE.
Supports 24 reordering models such as MonoT5, RankGPT, Sentence Transformer etc., to improve the accuracy of search results.
Integrated Retrieval Augmented Generation (RAG) with support for GPT, LLaMA, T5, and other model generation responses.
Built-in 40 pre-retrieved datasets covering Q&A, dialog, entity linking, and other scenarios.
Provides evaluation tools to calculate metrics for retrieval, reordering, and generating results, such as Top-K, EM, Recall.
Supports pre-built indexes (e.g. Wikipedia and MS MARCO), eliminating the need to build your own indexes.
Modular structure that allows users to customize datasets, retrievers and models.

Using Help

Rankify is straightforward to install and use. Below are detailed steps and instructions to help you get started quickly.

Installation process

Rankify requires Python 3.10 or above. It is recommended to install it in a virtual environment to avoid dependency conflicts.

Creating a virtual environment (recommended)
Create an environment using Conda:

conda create -n rankify python=3.10
conda activate rankify

or use Python's own tools:

python -m venv rankify_env
source rankify_env/bin/activate  # Linux/Mac
rankify_env\Scripts\activate    # Windows

Install PyTorch (recommended version 2.5.1)
If you have a GPU, install the version with CUDA 12.4:

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

If you do not have a GPU, install the CPU version:

pip install torch==2.5.1

Foundation Installation
Installation of Rankify core functionality:

pip install rankify

Complete installation (recommended)
Install all features:

pip install "rankify[all]"

On-demand installation (optional)
Only the search function is installed:

pip install "rankify[retriever]"

Only the reordering function is installed:

pip install "rankify[reranking]"

Install the latest version from GitHub (optional)
Get the development version:

git clone https://github.com/DataScienceUIBK/Rankify.git
cd Rankify
pip install -e ".[all]"

Installation of ColBERT Retriever (optional)
Additional configuration is required:

conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
rm -rf ~/.cache/torch_extensions/*

Rankify is ready to use once installation is complete.

Function Operation Guide

1. Use of pre-searched data sets

Rankify offers 40 pre-retrieved datasets that can be downloaded from Hugging Face.

move::

Import dataset module.
Select the retriever and dataset.
Download or load data.

sample code (computing)::

from rankify.dataset.dataset import Dataset
# 查看可用数据集
Dataset.available_dataset()
# 下载 BM25 的 nq-dev 数据集
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# 加载本地数据集
documents = Dataset.load_dataset('./bm25_nq_dev.json', 100)

2. Use of search functions

Supports a variety of search methods, such as BM25, DPR, etc.

move::
1. Initialize the retriever.
2. Enter a document or question.
3. Get search results.

sample code (computing)::

from rankify.retrievers.retriever import Retriever
# 使用 BM25 检索 Wikipedia
retriever = Retriever(method="bm25", n_docs=5, index_type="wiki")
docs = [{"question": "太阳是什么？"}]
results = retriever.retrieve(docs)
print(results)

3. Use of the reordering function

Reordering optimizes retrieval results and supports multiple models.

move::
1. Prepare initial search results.
2. Initialize the reordering model.
3. Reordering.

sample code (computing)::

from rankify.models.reranking import Reranking
from rankify.dataset.dataset import Document, Question, Context
# 准备数据
question = Question("太阳是什么？")
contexts = [Context(text="太阳是恒星。", id=1), Context(text="月亮不是恒星。", id=2)]
doc = Document(question=question, contexts=contexts)
# 重排序
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")
reranker.rank([doc])
for ctx in doc.reorder_contexts:
print(ctx.text)

4. Use of the RAG function

RAG combines retrieval and generation to generate accurate responses.

move::
1. Prepare documentation and questions.
2. Initialize the generator.
3. Generate Answers.

sample code (computing)::

from rankify.generator.generator import Generator
doc = Document(question=Question("法国首都是什么？"), contexts=[Context(text="法国首都是巴黎。", id=1)])
generator = Generator(method="in-context-ralm", model_name="meta-llama/Llama-3.1-8B")
answers = generator.generate([doc])
print(answers)  # 输出：["巴黎"]

5. Assessment of results

Built-in evaluation tool to check performance.

sample code (computing)::

from rankify.metrics.metrics import Metrics
metrics = Metrics(documents)
retrieval_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10])
print(retrieval_metrics)

caveat

GPU users need to make sure PyTorch supports CUDA.
High memory devices are recommended for large data sets.
For more details, see the official documentation at http://rankify.readthedocs.io/.

application scenario

academic research
Researchers can use Rankify to test retrieval and reordering algorithms and analyze performance.
intelligent question and answer (Q&A)
Developers can use RAG to build Q&A systems to answer user questions.
Search Optimization
The reordering feature improves the relevance of search results and is suitable for improving search engines.

QA

What systems does Rankify support?
Windows, Linux, and macOS are supported, as long as Python 3.10+ is installed.
Do I need to network?
Core functionality is available offline, but datasets and some models need to be downloaded.
Does it support Chinese?
Supported, but pre-built indexes are mainly in English (Wikipedia and MS MARCO).

Rankify: a Python toolkit supporting information retrieval and reordering

General Introduction

Function List

Using Help

Installation process

Function Operation Guide

1. Use of pre-searched data sets

2. Use of search functions

3. Use of the reordering function

4. Use of the RAG function

5. Assessment of results

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification