General Introduction Genspark is an artificial intelligence-based search tool. It was founded in 2023 by a former Baidu executive and is based in Palo Alto, California. Unlike traditional search engines, Genspark uses multiple AI intelligences to generate customized search result pages in real time, called "Sparkpage...
General Introduction DeepSite is an AI-based website generation tool that allows users to quickly generate a live, runnable front-end web page by simply entering a simple text description. Developed by Hugging Face community member enzostvs, it relies on the powerful DeepSeek V3 (0324) model, which combines natural...
I've tried to convert speech to multi-speaker subtitle with Gemini 2.0 for free before, and the result is quite good. I tried it again with Gimine 2.5 pro. First of all, I found a sample of standard SRT subtitle as a reference benchmark (speech-to-text conversion is done in advance, using the mainstream model in the market): 00...
Enable Builder Smart Programming Mode, unlimited use of DeepSeek-R1 and DeepSeek-V3, smoother experience than the overseas version. Just enter the Chinese commands, even a novice programmer can write his own apps with zero threshold.
General Introduction uniOCR is an open source text recognition tool developed by mediar-ai team. It is based on the Rust language and supports macOS, Windows and Linux systems. It supports macOS, Windows and Linux systems. users can use it to extract text from images, the operation is simple and free. uniOCR's core feature is cross-platform support...
General Introduction Serena is a free and open source programming tool developed by the Oraios AI team and hosted on GitHub. It is a powerful code assistant that works directly in your codebase to help developers analyze, edit, and execute code.Serena is implemented through the Language Server Protocol (LSP)...
General Introduction AudioX is an open source project by Zeyue Tian et al. on GitHub, with an official paper published on arXiv (No. 2503.10522). It is based on the diffusion transformer (Diffusion Transformer) technology , from text, video, images, audio and other input to generate high-quality ...
General Introduction EasyControl is an open source project, the project is based on the Diffusion Transformer (DiT) architecture to provide efficient and flexible image generation control. Among them, Ghibli Control LoRA is one of its featured functions, by using only 100 Asian faces and their GPT-4o generated Ghibli style images...
YOLOE is an open source project developed by the Multimedia Intelligence Group (THU-MIG) at Tsinghua University School of Software, with the full name "You Only Look Once Eye". It is based on the PyTorch framework, and is an extension of the YOLO series, which can detect and segment any object in real time. The project is hosted on GitHub, ...
General Introduction Open-VoiceCanvas is an open source speech synthesis platform developed by the ItusiAI team. It supports more than 50 languages, can turn text into natural speech, and can also clone personalized voices by uploading audio. The project integrates OpenAI TTS, AWS Polly and MiniMax three...
Libra is an innovative tool from Greenbit.ai, whose core function is to generate AI intelligences that can run locally through natural language conversations. Called the "Vibe Agent", it allows users to quickly create their own intelligences by describing their needs in simple terms, performing web searches, data...
General Introduction VideoMind is an open source multimodal AI tool focused on inference, Q&A and summary generation for long videos. It was developed by Ye Liu of the Hong Kong Polytechnic University and a team from Show Lab at the National University of Singapore. The tool mimics the way humans understand video by splitting tasks into planning,...
General Introduction SuperCoder is an intelligent tool running in the terminal, designed for programmers. It utilizes AI technology to help users search code, view project structure, edit files, and fix bugs.The project is open sourced by huytd on GitHub and supports Linux, MacOS, and Windows...
General Introduction Emigo is an open source AI programming assistant for Emacs, developed by MatthewZMD on GitHub. Emigo is an open source AI programming assistant designed for Emacs and developed by MatthewZMD on GitHub. It helps programmers to complete code analysis, generation, modification and other tasks in Emacs by integrating a large-scale language model (LLM).
General Introduction SegAnyMo is an open source project developed by a team of researchers at UC Berkeley and Peking University, including members such as Nan Huang. This tool focuses on video processing and can automatically recognize and segment arbitrary moving objects in a video, such as people, animals or vehicles. It combines TAP...
General Introduction GeminiCode is an AI programming assistant that runs in a terminal, developed by developers in their spare time on weekends. It is based on Google's Gemini 2.5 Pro model and can read and modify files in the current directory of your computer. The tool is inspired by Anthropic's Claude Co...
General Introduction GenXD is an open source project, developed by the National University of Singapore (NUS) and Microsoft team. It focuses on generating arbitrary 3D and 4D scenes, to solve the real-world 3D and 4D generation due to insufficient data and model design complexity brought about by the problem. The project analyzes the camera and object motion, kn...
General Introduction ChatAnyone is an innovative project developed by the HumanAIGC team. It utilizes artificial intelligence techniques to generate digital human portrait videos with upper body movements from a single photo and audio input. The project is based on a hierarchical motion diffusion model that generates head movements, gestures and expressions for...
General Introduction Search-R1 is an open source project, developed by PeterGriffinJin on GitHub, built on the veRL framework. It uses reinforcement learning (RL) techniques to train large language models (LLMs), allowing the models to autonomously learn to reason and invoke search engines to solve problems. The project supports Qwen2.5...
General Introduction OctoComics is an online platform that focuses on helping users quickly generate BL comics with AI, while supporting other types of comics and community sharing. Users can input text to generate BL theme, original serialized or OC character comics, with various drawing styles and flexible sub-scene editing. It is suitable for BL ...