kimi launches visual version of o1 to think and solve problems visually

AI News1yrs agorelease AI Sharing Circle

52.9K 00

Everyone is using AI tools, and we've watched AI evolve and grow step by step. Mostly we used to just talk to them with text, and there are times when Kernel wonders when it would be nice to be able to think well about pictures.

After researching a bunch of AI, I used Kimi later and found that its reasoning capabilities can be impressive.

At the time, I wondered if its capabilities could be extended to be multimodal, where sending a picture and a video would allow for reflective reasoning and finally give a reliable answer.

I didn't expect it, Kimi backhanded an update and added a superb image recognition, tried it out not expecting to be surprised by even text recognition.

Kimi Intelligent Assistant has been updated again! Not long after the release of the Math Edition that I introduced to you last time, Kimi has now upgraded and gone live with the fun and useful K1 model from the Math Edition, and the corresponding product is Kimi - Glasses Wearing Edition!

His real name is Kimi Visual Thinking Edition.

This model can recognize complex picture content, detailed "mathematical and scientific answers and logical reasoning", a number of tests exceeded OpenAI's o1 model, and the ability to recognize handwritten content is also very strong, and can recognize a variety of scenarios to shoot the picture.

It looks pretty good, so let's get right down to business. The first thing is its outrageous text recognition, as Kimi can recognize even complex mathematical characters, while Chinese is a bit simpler, take the picture below, it's a no-brainer.

Kimi's identification results

Screenshot tools such as PixPin, which are commonly used by everyone, are also text-recognizable, but there is a problem with the recognition of the upper half of the paragraph (directly unrecognizable), and there is a problem with the correctness of the recognition.

Recognition of screenshot tools

It's all well and good to say that the recognition rate is correct - after all, it's not exactly the same kind of tool, and some of the differences are not surprising - but Kimi is not a rigid recognition tool! It even corrects and "fact-checks" the text of the original image, literally "analyzing every pixel".

The box below is corrected by Kimi

Correct Standing Posture

How is this not a descending blow to OCR tools.

In addition to text recognition, the ability to answer questions is there.

First of all, let's play a simple picture reasoning questions, find the pattern in the picture below to choose the correct option, this question is the public examination test graphical reasoning examples, go you ~

The answers are in the red box. They're not for Kimi.

If you are not exposed to similar questions, you might be a bit confused when you see the question and have to think for a while, whereas Kimi analyzed the question in a long list, gave the process in detail at each step, and finally gave the correct answer.

The points mentioned in the answer: straight lines and curves, whether the graph is closed or not, and Kimi's thoughts accordingly.

Basic reasoning is hard for it, come and try what needs to be added to the calculation.

Kimi's answer was fast and the result was correct, and it confirmed her answer three times for the sake of rigor, thinking about other possible mistakes. It can be used as a reference for solving problems in the future, to see if you are the same as the Kimi Same faulty reflexive logic.

This is the type of content that is easier for Kimi.

Let's look at an advanced one again.

And Kimi used to do the code topic is more professional counterparts, in the force button to find a topic, directly screenshot thrown to Kimi.

Trolling about this topic

Kimi's answer:

The final result is normal through the test, this encounter will not be able to question, you can let Kimi teach you how to do, by the way, learn its ideas, real people beat the 5% submission on the self-hacking "very strong", and Kimi on the first hand is 77%.

In addition to solving problems, Kimi can also analyze the various forms she encounters on a daily basis.

Like the question above one should not think it's too easy, throw in another AI and one won't squeak when asked.

And this time, Kimi Visual Thinking Edition is also without the use of limitations, in the future, everyone in the hands of the data that can be converted into image content, can be given to Kimi to unlock more information.

Looking at Kimi's updates, it's more like unlocking new skills after doing one thing to an excellent level, rather than doing a whole bunch of them and a whole bunch of them don't work well, which leaves some anticipation for stronger products to follow, tools for generating videos and manipulating the software, and so on.