InternVLA-M1 - Shanghai AI Lab's Open Source Embodied Dual System Operation "Brain"

Latest AI Resources6mos agorelease AI Sharing Circle

What is InternVLA-M1?

InternVLA-M1 is an open-source embodied operating "brain" of Shanghai Artificial Intelligence Laboratory, which is a large model of two-system operation oriented to instruction following. It has constructed a complete closed loop covering "think-act-learn" and is responsible for high-level spatial reasoning and task planning. The model adopts a two-phase training strategy, firstly through spatial perception pre-training to enhance spatial reasoning and planning ability, and then through implicit spatial reasoning to realize efficient post-action training. Only "spatial planning hints" are needed for efficient training, which significantly reduces the cost. InternVLA-M1 reaches the international leading level in public operational benchmark tests such as SimplerEnv, and its command-following and unseen-object generalization abilities are significantly better than those of other similar models. Relying on the self-developed simulation platform InternData-M1, InternVLA-M1 completes large-scale pre-training, which is suitable for complex scenarios and long-range tasks.

Functional Features of InternVLA-M1

Higher order spatial reasoning skills: It can accurately perceive and reason about complex spatial environments and effectively plan operation paths and action sequences.
Dual System Training Strategy: Combining spatial perception pre-training and action post-training to improve the model's adaptability and generalization ability in different tasks.
Efficient training and cost optimization: Efficient training is achieved through spatial planning hints, which significantly reduces training cost and time and improves model utility.
Instruction Following and Generalization Capabilities: Strong instruction parsing and execution capabilities, accurate understanding and execution of natural language instructions, good generalization performance for unseen objects and new tasks.
Autonomous learning and closed-loop control: To build a complete "think-act-learn" closed loop, so that the model can be continuously learned and optimized in practice and adapted to the dynamic environment.
Complex Scene Adaptability: Excellent performance in real machine complex scenes and long-range tasks, suitable for a variety of practical application scenarios, such as industrial automation, logistics and warehousing.
Open Source and Community Support: Open source data and code, providing a rich resource for researchers and developers to foster innovation and application development in the community.

Core Benefits of InternVLA-M1

Efficient instruction following and generalization capabilities: Can accurately understand natural language commands, generate executable action sequences, and show strong generalization capabilities on unseen objects and new tasks.
Innovative dual-system architecture: Combining spatial perception pre-training and action post-training, it realizes the closed-loop execution from perception to operation, and improves the stability and adaptability of the model.
Spatial planning driven training strategies: Efficient training is achieved by introducing spatial planning hints, which significantly improves training efficiency and model performance.
Large-scale simulation data support: Relying on the self-developed simulation platform InternData-M1, a large amount of high-quality training data is generated, which enhances the model's generalization ability and task adaptability.
Open Source and Community Support: open source code and data, providing a rich resource for researchers and developers to foster innovation and application development in the community.
Leading performance: Achieved international leadership in several public operational benchmarks, especially in complex scenarios and long-range tasks.
Multi-scenario applicability: It is applicable to many fields such as family, industry, logistics, education, etc., and builds up a solid technical foundation for the application of general-purpose robots in real scenarios.

What is InternVLA-M1's official website?

Project website:: https://internrobotics.github.io/internvla-m1.github.io/
Github repository:: https://github.com/InternRobotics/InternVLA-M1
HuggingFace Model Library:: https://huggingface.co/collections/InternRobotics/internvla-m1-68c96eaebcb5867786ee6cf3
HuggingFace Data Links:: https://huggingface.co/datasets/InternRobotics/InternData-M1
Technical Papers:: https://github.com/InternRobotics/InternVLA-M1/blob/InternVLA-M1/assets/InternVLA_M1.pdf

People who are interested in InternVLA-M1

Artificial Intelligence and Robotics ResearchersFor researchers in the fields of embodied intelligence, robot manipulation, and visual language modeling, InternVLA-M1 can be used to explore new technology paths and application scenarios.
Robotics System Development EngineerEngineers involved in the development, integration, and optimization of robotic systems can use the InternVLA-M1 to improve the robot's ability to operate and follow commands in complex tasks.
Teachers and students of universities and research institutes: Faculty and students in computer science, automation, robotics engineering, and other related disciplines can use the InternVLA-M1 as a teaching and research tool for hands-on projects and academic research.
Industrial Automation and Intelligent Manufacturing EnterprisesInternVLA-M1 can help to upgrade automation and improve efficiency for companies that want to introduce smarter and more flexible robotic solutions to their production lines.
Logistics and warehousing industry practitionersInternVLA-M1 can be used by companies and professionals who are interested in logistics automation and warehouse optimization to achieve intelligent operations in cargo picking and handling.
Service Robot DeveloperInternVLA-M1 can enhance the robot's interaction and task execution capabilities and expand the range of applications for teams developing home service robots and commercial service robots.