AI Personal Learning
and practical guidance

kimi launches visual version of o1 to think and solve problems visually

Everyone is using AI tools, and we've watched AI evolve and grow step by step. Mostly we used to just talk to them with text, and there are times when Kernel wonders when it would be nice to be able to think well about pictures.

After researching a bunch of AI, I used Kimi later and found that its reasoning capabilities can be impressive.


At the time, I wondered if its capabilities could be extended to be multimodal, where sending a picture and a video would allow for reflective reasoning and finally give a reliable answer.

I didn't expect it, Kimi backhanded an update and added a superb image recognition, tried it out not expecting to be surprised by even text recognition.

 

Kimi Intelligent Assistant has been updated again! Not long after the release of the Math Edition that I introduced to you last time, Kimi has now upgraded and gone live with the fun and useful K1 model from the Math Edition, and the corresponding product is Kimi - Glasses Wearing Edition!

His real name is Kimi Visual Thinking Edition.

 

This model can recognize complex picture content, detailed "mathematical and scientific answers and logical reasoning", a number of tests exceeded OpenAI's o1 model, and the ability to recognize handwritten content is also very strong, and can recognize a variety of scenarios to shoot the picture.

 

 

It looks pretty good, so let's get right down to business. The first thing is its outrageous text recognition, as Kimi can recognize even complex mathematical characters, while Chinese is a bit simpler, take the picture below, it's a no-brainer.

 

Kimi's identification results

 

Screenshot tools such as PixPin, which are commonly used by everyone, are also text-recognizable, but there is a problem with the recognition of the upper half of the paragraph (directly unrecognizable), and there is a problem with the correctness of the recognition.

Recognition of screenshot tools

 

It's all well and good to say that the recognition rate is correct - after all, it's not exactly the same kind of tool, and some of the differences are not surprising - but Kimi is not a rigid recognition tool! It even corrects and "fact-checks" the text of the original image, literally "analyzing every pixel".

The box below is corrected by Kimi

 

The box below is corrected by Kimi

 

Correct Standing Posture

 

How is this not a descending blow to OCR tools.

In addition to text recognition, the ability to answer questions is there.

First of all, let's play a simple picture reasoning questions, find the pattern in the picture below to choose the correct option, this question is the public examination test graphical reasoning examples, go you ~

The answers are in the red box. They're not for Kimi.

 

If you are not exposed to similar questions, you might be a bit confused when you see the question and have to think for a while, whereas Kimi analyzed the question in a long list, gave the process in detail at each step, and finally gave the correct answer.

 

 

The points mentioned in the answer: straight lines and curves, whether the graph is closed or not, and Kimi's thoughts accordingly.

 

 

Basic reasoning is hard for it, come and try what needs to be added to the calculation.

 

Kimi's answer was fast and the result was correct, and it confirmed her answer three times for the sake of rigor, thinking about other possible mistakes. It can be used as a reference for solving problems in the future, to see if you are the same as the Kimi Same faulty reflexive logic.

 

 

This is the type of content that is easier for Kimi.

Let's look at an advanced one again.

And Kimi used to do the code topic is more professional counterparts, in the force button to find a topic, directly screenshot thrown to Kimi.

 

 

 

Trolling about this topic

 

 

Kimi's answer:

 

The final result is normal through the test, this encounter will not be able to question, you can let Kimi teach you how to do, by the way, learn its ideas, real people beat the 5% submission on the self-hacking "very strong", and Kimi on the first hand is 77%.

 

In addition to solving problems, Kimi can also analyze the various forms she encounters on a daily basis.

 

Like the question above one should not think it's too easy, throw in another AI and one won't squeak when asked.

 

And this time, Kimi Visual Thinking Edition is also without the use of limitations, in the future, everyone in the hands of the data that can be converted into image content, can be given to Kimi to unlock more information.

Looking at Kimi's updates, it's more like unlocking new skills after doing one thing to an excellent level, rather than doing a whole bunch of them and a whole bunch of them don't work well, which leaves some anticipation for stronger products to follow, tools for generating videos and manipulating the software, and so on.

AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " kimi launches visual version of o1 to think and solve problems visually

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish