GLM-4V Series The GLM-4V series contains 3 models, which are suitable for different application scenarios. GLM-4V-Plus:With excellent multimodal understanding capability, it can process up to 5 images simultaneously and support video content understanding, which is suitable for complex multimedia analysis scenarios. GLM-4V: Focuses on image content understanding...
General Introduction VideoFX is an innovative video generation tool from Google Labs designed to help users easily create creative and visually stunning video content. Utilizing advanced Veo 2.0 technology, the tool offers a wide range of video effects and editing features suitable for a variety of creative needs. Whether for personal use...
General Introduction ImageFX is a powerful image generation tool from Google Labs. Users can transform ideas into high-quality images with simple text input. The tool utilizes advanced artificial intelligence technology to support image generation in a variety of styles and themes for designers, artists...
General Introduction Whisk is an innovative AI image generation tool from Google Labs designed to mix different themes, scenes and styles by uploading multiple images. Unlike traditional image generation tools that rely on text prompts, Whisk primarily uses images as input, allowing users to create more intuitive...
Earlier this year, Google launched its video generation model Veo and its newest image generation model, Imagen 3. Since then, it's been exciting to see people bring their ideas to life with these models: YouTube creators are exploring the creation of video backgrounds for YouTube Shorts...
Recently, GenmoAI open-sourced the video generation model mochi 1-preview (10B) with high-fidelity actions and powerful cue-following capabilities, currently supporting 480p resolution video generation. Today, SiliconCloud, a silicon-based mobility SiliconCloud, went live with an inference-accelerated version of mochi-1-preview (priced at ¥2.8/Video...
In today's competitive e-commerce market, how to make your product stand out from the crowd of choices has become a challenge that every brand and business must face. The importance of visual merchandising as one of the key factors for e-commerce success cannot be overstated. An appealing and professional presentation of product images doesn't...
Comprehensive Introduction Leffa is a unified framework for generating controllable character images, enabling precise manipulation of character appearance (e.g., virtual fitting) and pose (e.g., pose transfer). The framework significantly reduces distortion of fine-grained details by directing the target query to focus on the correct reference key in the attention layer, while preserving...
General Introduction MMAudio is an open source project aiming at generating high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.The core innovation of MMAudio is...
General Introduction H2O GPT is an open source project that aims to provide privatized chat and document processing capabilities. The project is based on the Apache 2.0 license , supports a variety of GPT models , including LLaMa2, Mistral, Falcon and so on. Users can use H2O GPT to achieve local documents (such as PDF, E...