AI Personal Learning
and practical guidance
Beanbag Marscode1
Total 15 articles

Tags: vision target detection

YOLOE:实时视频检测和分割物体的开源工具-首席AI分享圈

YOLOE: an open source tool for real-time video detection and segmentation of objects

YOLOE is an open source project developed by the Multimedia Intelligence Group (THU-MIG) at Tsinghua University School of Software, with the full name "You Only Look Once Eye". It is based on the PyTorch framework, and is an extension of the YOLO series, which can detect and segment any object in real time. The project is hosted on GitHub, ...

SegAnyMo:从视频中自动分割任意运动物体的开源工具-首席AI分享圈

SegAnyMo: open source tool to automatically segment arbitrary moving objects from video

General Introduction SegAnyMo is an open source project developed by a team of researchers at UC Berkeley and Peking University, including members such as Nan Huang. This tool focuses on video processing and can automatically recognize and segment arbitrary moving objects in a video, such as people, animals or vehicles. It combines TAP...

YOLOv12:实时图像和视频目标检测的开源工具-首席AI分享圈

YOLOv12: Open source tool for real-time image and video target detection

Comprehensive introduction YOLOv12 is an open source project developed by GitHub user sunsmarterjie , focusing on real-time target detection technology . The project is based on YOLO (You Only Look Once) series of frameworks , the introduction of the attention mechanism to optimize the performance of traditional convolutional neural networks (CNN) , not only in the detection of ...

HealthGPT:支持医学图像分析与诊断问答的医疗大模型-首席AI分享圈

HealthGPT: A Medical Big Model to Support Medical Image Analysis and Diagnostic Q&A

Comprehensive Introduction HealthGPT is a state-of-the-art medical grand visual language model designed to enable unified medical visual understanding and generation capabilities through heterogeneous knowledge adaptation. The goal of the project is to integrate medical vision understanding and generation capabilities into a unified autoregressive framework, significantly enhancing the medical image processing...

MedRAX: 利用多模态大模型进行胸部X光片分析的智能体-首席AI分享圈

MedRAX: A Smart Body for Chest X-ray Analysis Using Multimodal Large Models

Comprehensive Introduction MedRAX is a state-of-the-art AI intelligence designed specifically for Chest X-ray (CXR) analysis. It integrates state-of-the-art CXR analysis tools and a multimodal large language model to dynamically process complex medical queries without additional training.MedRAX, through its modular design and strong technological base,...

CogVLM2:开源多模态模型,支持视频理解与多轮对话-首席AI分享圈

CogVLM2: Open Source Multimodal Modeling with Support for Video Comprehension and Multi-Round Dialogue

General Introduction CogVLM2 is an open source multimodal model developed by the Tsinghua University Data Mining Research Group (THUDM), based on the Llama3-8B architecture, and designed to provide performance comparable to or even better than GPT-4V. The model supports image understanding, multi-round dialog, and video understanding, and is capable of handling content up to 8K long...

Twelve Labs:理解视频内容的多模态AI解决方案,视频搜索、生成、嵌入API服务-首席AI分享圈

Twelve Labs: multimodal AI solution for understanding video content, video search, generation, embedding API services

General Introduction Twelve Labs is a multimodal AI company focused on video understanding, dedicated to helping users understand and process large amounts of video content through advanced AI technologies. Its core technologies include video search, generation, and embedding that can extract key features from video such as actions, objects, on-screen text,...

en_USEnglish