General Introduction
Rankify is an open source Python toolkit developed by the Data Science Group at the University of Innsbruck, Austria. It focuses on information retrieval, reordering and retrieval augmentation generation (RAG), providing a unified framework. The toolkit has 40 built-in pre-retrieval benchmark datasets, supports 7 retrieval techniques and 24 reordering models, and includes a variety of RAG Rankify is modular and easily extensible, suitable for researchers and developers for experimentation and benchmarking. The code is open and well-documented, and supports Python 3.10 and above.
Function List
- Seven retrieval techniques are available, including BM25, DPR, ColBERT, ANCE, BGE, Contriever and HYDE.
- Supports 24 reordering models such as MonoT5, RankGPT, Sentence Transformer etc., to improve the accuracy of search results.
- Integrated Retrieval Augmented Generation (RAG) with support for GPT, LLaMA, T5, and other model generation responses.
- Built-in 40 pre-retrieved datasets covering Q&A, dialog, entity linking, and other scenarios.
- Provides evaluation tools to calculate metrics for retrieval, reordering, and generating results, such as Top-K, EM, Recall.
- Supports pre-built indexes (e.g. Wikipedia and MS MARCO), eliminating the need to build your own indexes.
- Modular structure that allows users to customize datasets, retrievers and models.
Using Help
Rankify is straightforward to install and use. Below are detailed steps and instructions to help you get started quickly.
Installation process
Rankify requires Python 3.10 or above. It is recommended to install it in a virtual environment to avoid dependency conflicts.
- Creating a virtual environment (recommended)
Create an environment using Conda:
conda create -n rankify python=3.10
conda activate rankify
or use Python's own tools:
python -m venv rankify_env
source rankify_env/bin/activate # Linux/Mac
rankify_env\Scripts\activate # Windows
- Install PyTorch (recommended version 2.5.1)
If you have a GPU, install the version with CUDA 12.4:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
If you do not have a GPU, install the CPU version:
pip install torch==2.5.1
- Foundation Installation
Installation of Rankify core functionality:
pip install rankify
- Complete installation (recommended)
Install all features:
pip install "rankify[all]"
- On-demand installation (optional)
Only the search function is installed:
pip install "rankify[retriever]"
Only the reordering function is installed:
pip install "rankify[reranking]"
- Install the latest version from GitHub (optional)
Get the development version:
git clone https://github.com/DataScienceUIBK/Rankify.git
cd Rankify
pip install -e ".[all]"
- Installation of ColBERT Retriever (optional)
Additional configuration is required:
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
rm -rf ~/.cache/torch_extensions/*
Rankify is ready to use once installation is complete.
Function Operation Guide
1. Use of pre-searched data sets
Rankify offers 40 pre-retrieved datasets that can be downloaded from Hugging Face.
- move::
- Import dataset module.
- Select the retriever and dataset.
- Download or load data.
- sample code (computing)::
from rankify.dataset.dataset import Dataset
# 查看可用数据集
Dataset.available_dataset()
# 下载 BM25 的 nq-dev 数据集
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# 加载本地数据集
documents = Dataset.load_dataset('./bm25_nq_dev.json', 100)
2. Use of search functions
Supports a variety of search methods, such as BM25, DPR, etc.
- move::
- Initialize the retriever.
- Enter a document or question.
- Get search results.
- sample code (computing)::
from rankify.retrievers.retriever import Retriever # 使用 BM25 检索 Wikipedia retriever = Retriever(method="bm25", n_docs=5, index_type="wiki") docs = [{"question": "太阳是什么?"}] results = retriever.retrieve(docs) print(results)
3. Use of the reordering function
Reordering optimizes retrieval results and supports multiple models.
- move::
- Prepare initial search results.
- Initialize the reordering model.
- Reordering.
- sample code (computing)::
from rankify.models.reranking import Reranking from rankify.dataset.dataset import Document, Question, Context # 准备数据 question = Question("太阳是什么?") contexts = [Context(text="太阳是恒星。", id=1), Context(text="月亮不是恒星。", id=2)] doc = Document(question=question, contexts=contexts) # 重排序 reranker = Reranking(method="monot5", model_name="monot5-base-msmarco") reranker.rank([doc]) for ctx in doc.reorder_contexts: print(ctx.text)
4. Use of the RAG function
RAG combines retrieval and generation to generate accurate responses.
- move::
- Prepare documentation and questions.
- Initialize the generator.
- Generate Answers.
- sample code (computing)::
from rankify.generator.generator import Generator doc = Document(question=Question("法国首都是什么?"), contexts=[Context(text="法国首都是巴黎。", id=1)]) generator = Generator(method="in-context-ralm", model_name="meta-llama/Llama-3.1-8B") answers = generator.generate([doc]) print(answers) # 输出:["巴黎"]
5. Assessment of results
Built-in evaluation tool to check performance.
- sample code (computing)::
from rankify.metrics.metrics import Metrics metrics = Metrics(documents) retrieval_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10]) print(retrieval_metrics)
caveat
- GPU users need to make sure PyTorch supports CUDA.
- High memory devices are recommended for large data sets.
- For more details, see the official documentation at http://rankify.readthedocs.io/.
application scenario
- academic research
Researchers can use Rankify to test retrieval and reordering algorithms and analyze performance. - intelligent question and answer (Q&A)
Developers can use RAG to build Q&A systems to answer user questions. - Search Optimization
The reordering feature improves the relevance of search results and is suitable for improving search engines.
QA
- What systems does Rankify support?
Windows, Linux, and macOS are supported, as long as Python 3.10+ is installed. - Do I need to network?
Core functionality is available offline, but datasets and some models need to be downloaded. - Does it support Chinese?
Supported, but pre-built indexes are mainly in English (Wikipedia and MS MARCO).