Comprehensive Introduction NV Ingest (NVIDIA Ingest) is a suite of early access microservices designed for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents. It can convert these documents into metadata and text for embedding into retrieval systems.NVIDIA Ingest supports...
Comprehensive Introduction Always-On AI Assistant is an innovative AI assistant project that creates a powerful and permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer. The project is especially optimized for engineering development scenarios, providing a complete...
Enable Builder Smart Programming Mode, unlimited use of DeepSeek-R1 and DeepSeek-V3, smoother experience than the overseas version. Just enter the Chinese commands, even a novice programmer can write his own apps with zero threshold.
Comprehensive Introduction STAR (Spatial-Temporal Augmentation with Text-to-Video Models) is an innovative video super-resolution framework jointly developed by Nanjing University, ByteDance and Southwest University. The project is dedicated to solving key problems in real-world video super-resolution processing by...
General Introduction ImBD (Imitate Before Detect) is a pioneering machine-generated text detection project that was presented at the AAAI 2025 conference. With the widespread use of Large Language Models (LLMs) such as ChatGPT, recognizing AI-generated text content is becoming increasingly challenging.The ImBD project proposes...
Comprehensive Introduction Browser Use Web UI is an innovative open source project focused on providing AI agents with a graphical interface tool for browser interaction capabilities. The project is built on top of the browser-use core framework , through Gradio to build a user-friendly Web interface , making it easy for AI agents to ...
Comprehensive Introduction This is a structured report generation blueprint project co-developed by LangChain and NVIDIA, showcased in a Jupyter notebook tutorial on GitHub. The project utilizes advanced AI techniques, specifically the Llama-3.3-70b model, to automate the generation of professional technical reports. The core features of the project ...
General Introduction BrownChat is a real-time audio chat application based on Large Language Modeling (LLM) technology. Developed by GitHub user sugarforever, the project aims to enhance the user's communication experience through advanced natural language processing technology.BrownChat provides an open source platform where users...
Comprehensive Introduction Lecca is a powerful AI platform that allows users to configure and deploy Large Language Models (LLMs) with multiple tools and workflows. Users can easily build, customize and automate their AI agents.Lecca offers a wide selection of AI providers and models, supports tool integration and workflow...
Comprehensive Introduction Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports multiple ...
Comprehensive Introduction FitDiT is a high-fidelity virtual fitting system based on diffusion transformers (Diffusion Transformers). Developed by Tencent AI Lab, the project aims to address the limitations of traditional virtual fitting systems in displaying garment details.FitDiT innovatively proposes a new algorithmic architecture that can...
General Introduction Thin-Plate-Spline-Motion-Model is a groundbreaking image animation generation project presented at CVPR 2022. The project is based on the theory of Thin-Plate Spline Transforms and is able to realize high-quality animation effects from still images based on drive videos. The project uses an end-to-end unsupervised learning framework ...
General Introduction DUIX (Dialogue User Interface System) is an AI-driven digital human interaction platform created by Silicon Intelligence. With open source digital human interaction features, developers can easily integrate large-scale modeling, automatic speech recognition (ASR) and text-to-speech (TTS) features to achieve the same level of interaction with digital...
Comprehensive Introduction Fay is an open source 3D virtual digital human framework that integrates language models and digital characters for a variety of application scenarios, such as virtual shopping guides, virtual anchors, assistants, waiters, teachers, and voice- or text-based mobile assistants.The Fay framework supports full offline use, providing milliseconds back...
General Introduction MOFA-Video is an advanced image animation generation tool that utilizes generative motion field adaptation techniques to convert static images into dynamic videos. It was developed in collaboration with the University of Tokyo and Tencent AI Lab and will be presented at the European Conference on Computer Vision (ECCV) 2024.MOFA-Vi...
General Introduction Amurex is an open source AI meeting assistant developed by The Personal AI Company that aims to improve meeting efficiency through intelligent features.Amurex is able to provide real-time advice, generate intelligent summaries, record meeting content, and automatically send follow-up emails. Its design focuses on transparency, security and...
General Introduction E2B Open Computer Use is an open source project that aims to provide a secure cloud-based Linux computer use experience through the E2B Desktop Sandbox.The E2B Sandbox provides a desktop graphical environment that users can connect to any Large Language Model (LLM) to control their computers, supporting...
Comprehensive Introduction Agent Laboratory is an end-to-end autonomous research workflow designed to help researchers realize their research ideas. The system consists of dedicated agents driven by large-scale language models that support the entire research workflow - from conducting literature reviews and developing plans to executing experiments and writing synthesis...
Comprehensive Introduction Kokoro-FastAPI is a Docker-based FastAPI package designed to provide support for the Kokoro-82M text-to-speech model. The project supports NVIDIA GPU acceleration and provides queue processing and auto splicing to make speech output of raw grown text more efficient and coherent. The project ...
General Description CoolCline is a powerful coding assistant that combines the best features of Cline, Roo Cline and Bao Cline. It works seamlessly with your command line interface (CLI) and editor to bring you the most powerful AI development experience.CoolCline is an open source project...