AI Personal Learning
and practical guidance

Transcript: extracting JSON data from 35 seconds of recorded video relying on Google Gemini multimodal capabilities

The other day, I found myself needing to add up some values scattered across twelve different emails.

I didn't want to copy and paste all the numbers one by one, so I decided to try something different: could I record the screen while browsing my Gmail account and then use Google Gemini Extracting numbers from that video?


As a result, this method effectbundleGood.

 

AI Studio and QuickTime

I use QuickTime Player on my Mac to record video:File -> New Screen Recording. I drew a box on the screen that framed the part of my Gmail account and then clicked on each email in turn, leaving each one for a few seconds.

Then, I uploaded the recorded file directly to Google's AI Studio tool and enter the following prompt:

Converts it to a JSON array, where each item contains the date in yyyy-mm-dd format and the floating-point amount for that date

The ...... result was successful. It outputs a JSON array that looks like this:

[
  {
    "date": "2023-01-01",
    "amount": 2...
  },
  ...
]

Transcript: extracting JSON data from 35 seconds of screen capture-1

I wanted to paste it into Numbers, so I proceeded to type:

Convert it to a csv that can be copied and pasted

It gave me the same data in CSV format.

You should never fully trust these tools to not make mistakes, so I rewatched this 35 second video and manually checked all the numbers. It was all correct.

Originally I was going to use Gemini 1.5 Pro, which is Google's best model ...... but it turns out I forgot to select a model, and I actually used the much cheaper Gemini 1.5 Flash 002 for the whole process.

 

How much did it cost?

According to AI Studio, I used 11,018 Token, of which 10,326 were for video.

Gemini 1.5 Flash Fee Schedule $0.075/per million Token (Prices in Reduced in August).

11018/1000000 = 0.011018
0.011018 * $0.075 = $0.00082635

Therefore, this entire process should cost less than 1/10th of a cent!

In fact, it isfreeGoogle AI Studio be facing (us) It's "still free" in all supported areas even with billing. But I'm sure that means theyYou can train your dataAnd that's something that their paid APIs don't do.

 

The other alternatives aren't really that good

Let's look at other alternatives here.

  • I can click on the emails one by one and copy the data manually. This is error prone and quite boring. Processing 12 emails is fine, but 100 would be a real pain.
  • Programmatically access my Gmail data. Every year this becomes harder and harder - although it's still possible to access it via IMAP, as long as you set up a dedicatedapplication password, but it still requires a lot of work for an ad hoc capture task.Official API It doesn't work well at all.
  • Use some sort of browser automation tool (like Playwright or similar) to automatically click through to my Gmail account. Even with the big language model to help write the code, this still requires more work, and it doesn't solve the problem of email formatting differences - I'd still have to solve the email parsing step separately.
  • Use some sort of more advanced existing AI tool to access my email. Another Google product (also called Gemini) can do this if you grant it access, but so far I've not been particularly happy with the results. ai tools are inherently unpredictable. I'm also reluctant to give any tool full access to my email account because of the possibility of things likeCue InjectionRisks like that.

 

Video capture technology is very powerful

this workvideo captureThe great thing about the technology is that it applies to _anything_ you see on your screen... And you have complete control over what you expose to the AI model.

There is no website authentication or anti-scraping technology that prevents me from recording screen video while clicking through web applications.

The results I get depend entirely on how carefully I plan the screen capture area and the clicking action.

There's absolutely no setup cost for this process - just log into the site, hit record, browse at your leisure, and drop the video into Gemini.

The cost was so low that I had to recalculate three times to make sure I hadn't miscalculated.

I expect I will be using this technique more in the future. It also has applications in the field of data journalism, where there is often a need to grab data from sources that don't want to be grabbed.

 

Plus: a price calculator for large language models

At the time of writing this lab report, I was tired of manually calculating Token prices. I usually outsource this to ChatGPT Code Interpreter, but I found that it was converting from dollars to cents when theThere's been an error., so I always had to double check its results.

So I let Claude 3.5 Sonnet built this for me using Claude ArtifactsPrice Calculator Tool(The source code is here):

Transcript: extracting JSON data from 35 seconds of screen capture-2

You can manually set the price of the input/output Token, or click on the preset buttons to automatically populate the prices of different existing models (as of October 16, 2024 - I don't promise to keep them up to date in the future!)

The entire calculator was written by Claude. Here isFull transcript of the conversation--We iterated through 10 different versions for 19 minutes.

Instead of looking up all the prices myself, I intercepted each model provider's pricing page and put them directly into a conversation with Claude:

Transcript: extracting JSON data from 35 seconds of screen capture-3

AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " Transcript: extracting JSON data from 35 seconds of recorded video relying on Google Gemini multimodal capabilities

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish