ColiVara: Visual Embedding Based Document Storage and Retrieval Service

Latest AI Resources6mos agorelease AI Sharing Circle

1.6K 00

General Introduction

ColiVara is a document storage and retrieval service based on visual embedding technology. It eliminates the need for Optical Character Recognition (OCR) or text extraction and avoids the problems of broken forms or lost images.ColiVara supports more than 100 file formats including PDF, DOCX, PPTX, etc., and is capable of automatically capturing and indexing web page screenshots. With ColiVara, you can efficiently store and retrieve documents rich in visual information, improving document management and information retrieval. coliVara provides APIs and SDKs for Python and TypeScript, so you don't need to manage vector databases (pgVector runs in the background). In addition, ColiVara provides detailed documentation and seamless local or cloud-based quick start guides, utilizing post-interactive embedding technology to improve accuracy. Best of all, ColiVara is completely open source.

Function List

Document Storage: Supports uploading and storing documents in multiple file formats.
Document Retrieval: Efficient document search and retrieval based on visual embedding technology.
Auto Screenshot: Automatically take screenshots of web pages and index them.
Metadata management: support for adding metadata to documents for easy categorization and retrieval.
API Interface: Provides Python and TypeScript SDKs for easy integration and use by developers.
Collection management: support for managing documents by collection, easy to organize and classify.
Multimodal Search: Supports the most advanced multimodal search function.
No vector database management required: pgVector runs in the background and the user does not need to manage the vector database.
Open Source: ColiVara is completely open source and users are free to use and modify it.

Using Help

Installation and Configuration

Get an API Key: Visit the ColiVara website to sign up and get a free API Key.
Install the SDK:
- Python:pip install colivara-py
- TypeScript:npm install colivara-ts
Configure the client:

   from colivara_py import ColiVara
client = ColiVara(api_key='你的API Key')

Document Upload

Upload the document:

   document = client.upsert_document(
name="sample_document",
document_url="https://example.com/sample.pdf",
metadata={"author": "John Doe"},
collection_name="user_1_collection",
wait=True
)

Upload a file path or Base64 encoded file:

   document = client.upsert_document(
name="sample_document",
document_path="/path/to/sample.pdf",
metadata={"author": "John Doe"},
collection_name="user_1_collection",
wait=True
)

document search

Simple Search:

   results = client.search("what is 1+1?")

Search by collection name:

   results = client.search("what is 1+1?", collection_name="user_1_collection")

Filter search by metadata:

   results = client.search(
"what is 1+1?",
query_filter={"on": "document", "key": "author", "value": "John Doe", "lookup": "key_lookup"}
)

Collection management

Create collections:

   collection = client.create_collection(name="user_1_collection")

Get the list of collections:

   collections = client.list_collections()

API Reference

Document Upload:upsert_document(name, document_url, metadata, collection_name, wait)
Document Retrieval:search(query, collection_name, query_filter)
Pooled management:create_collection(name), list_collections()