Week in review: early 2024 is a shocking enough week for the beginning of the AI field

AI News1yrs agorelease AI Sharing Circle

2.4K 00

Here's a story that was previously missed: this week, Nvidia's market capitalization surpassed that of Amazon and Google's parent company, Alphabet, jumping up to become the world's third most valuable company, with a staggering $1.83 trillion market cap. Here's a fun fact: Nvidia last surpassed Amazon in market capitalization back in 2002. 🤯 What a shocking rise of AI!

Now, let's take a look at some of the heavy hitters.

OpenAI revolutionizes the video production world

Just under a year ago, AI-based text-generated video technology was exceptionally bad (remember that Will Smith video?). . But just yesterday, OpenAI released Sora, its first video-generation model, and in just one day it turned the public's perception of AI video on its head.

In short: Sora is an AI model that can produce up to 60 seconds of video based on textual cues, and it is a diffusion model that builds on OpenAI's previous research on DALL-E and GPT models.

What's so special about Sora is that it creates extremely realistic, high-quality scenes with more than ten times the video length of existing video generators. It is able to accurately take into account all kinds of details and understand how they exist in the real world.

But there's more: it can also generate images (Midjourney beware), generate videos based on images, edit videos with text prompts, merge two videos, and even create infinite loops.

What are the shortcomings? OpenAI has released the model for "research purposes" (or to generate buzz), but is still waiting for a security assessment team to complete the risk assessment.

OpenAI also recognizes the model's shortcomings: Sora sometimes has problems capturing spatial details and physical laws. Sometimes it produces completely illogical results, such as generating a video of a jogger running backwards on a treadmill.

Try it out: while we don't have a way to experience Sora directly right now, you can experience the video generation simulator in OpenAI's research paper. Or, you can join the crowd of people who are constantly sending prompt requests to Sam Altman on Platform X and try to play with the technology (here's a personal favorite example).

From the details to the whole: OpenAI's breakthroughs in AI video are nothing short of mind-boggling, and with so much progress in just one year, who could have imagined the heights video generation technology would reach by 2025?

Google launches upgraded Gemini 1.5

Gemini 1.5 Pro demonstrates reasoning by analyzing 402 pages of transcripts

A week after Google launched the more powerful Gemini Ultra, the company followed up with Gemini 1.5, a multi-model that sets a new standard.

How does it work?Gemini 1.5 is so efficient thanks to its expert hybrid architecture: for each query, it activates only a specific part of the model instead of the whole model.

Why is it so important?Gemini 1.5 is capable of processing a huge amount of information at once - it has a context window of up to 1 million tokens, to be exact. That means it can handle 750,000 words of input, 11 hours of audio, 1 hour of video, and tens of thousands of lines of code.

Performance in practice: Gemini 1.5 has been shown to understand and reason about the 402 pages of transcripts from the Apollo 11 mission to the moon, to accurately analyze the numerous plots and events of a 44-minute silent movie, and to modify and interpret up to 100,000 lines of code.

Disclaimer: It's not available to the public yet, but Google will soon introduce 1.5 Pro with a standard context window of 128,000 tokens, and eventually scale up to 1 million token processing power.

ChatGPT can finally memorize

Ever had the experience of chatting with ChatGPT and always seem to be stuck in an endless loop of "Wait, who are you?" and the endless loop of "Wait, who are you?". Now, OpenAI has a solution: ChatGPT has a memory function.

OpenAI innovations: the addition of the Memory feature (still in beta) allows ChatGPT to store and recall information shared in previous chats, so you no longer need to start over in every conversation.

How it works: you can explicitly ask ChatGPT to remember a certain detail, or have it automatically capture and memorize information. Example:

You tell ChatGPT about your wheat-free bakery, and when you ask for brownie recipes, it will only recommend wheat-free recipes for you.

You tell ChatGPT that you want the minutes to appear as bulleted columns and bolded headings, and it will apply this format to all future meeting summaries.

What about privacy issues?OpenAI offers a range of options to give users control over the storage of their memories:

Users can view the contents of the memories stored in ChatGPT and selectively delete some of the information.

Using stealth mode, users can initiate queries without relying on previous memories.

From the details to the whole: ChatGPT's new Memory feature reduces the hassle of typing the same thing over and over again, saving users time and avoiding frustration. However, this new feature is about much more than convenience - it's a big leap forward in artificial intelligence towards humanized interaction.

Cashing in on Sound with ElevenLabs

ElevenLabs has just launched the Voice Actor Payment Program, a brand new opportunity for anyone to make money with AI.

Details:The Sound Actor Payment Plan allows sound professionals (anyone, really) to generate and share digitally cloned versions of their own voices.

Users simply upload a 30-minute voice sample and provide descriptive details (e.g., accent and gender).

Once uploaded to ElevenLab's sound library, your voice can be used around the world for voiceover and narration projects.

To prevent abuse, ElevenLabs administrators keep track of projects that use your voice and flag any inappropriate use. You can also enable automatic filters for extra protection.

From micro to macro: there's a lot of fear about AI taking away creative jobs. But ElevenLabs is an example of AI's potential to present new, financially lucrative opportunities to creatives and creators.

Meta introduced V-JEPA, a way to help train AI models about the real world through video.
Sam Altman is looking for $7 trillion (yes, with a "t") for a new AI chip project.
A Pakistani political candidate used AI to manage his campaign - from prison.
Nvidia has introduced a personalized chatbot that runs locally on your PC.
Apple has just launched a new image animation tool called Keyframer.
AI had its mainstream moment in this year's Super Bowl
Amazon researchers have developed the largest text-to-speech model to date - with promising results.
Microsoft outlined the top three AI trends to watch for in 2024.

AI News

Article copyright AI Sharing Circle All, please do not reproduce without permission.

Hugging Face 推出 Agent 智能体排行榜：谁是工具调用领域的领导者？

Hugging Face Launches Agent Intelligence Body Rankings: Who's the Leader in Tool Calling?

AI News

6mos ago

02.1K

DeepSeek-R1 官方提示词和参数配置：部署开源671B与DeepSeek官方表现一致

DeepSeek-R1 Official Cue Words and Parameter Configurations: Deploying Open Source 671B with DeepSeek's Official Performance

AI News

6mos ago

02.4K

2 times No.1 on daily charts in 30 days, millions of Reddit exposures, the efficient cold start story of AI tools going overseas

AI News

7mos ago

01.5K

Apple's AI Final Cut Pro 11 released: comes with AI features

AI News

9mos ago

01.8K

No comments

You must be logged in to leave a comment!

No comments...

Week in review: early 2024 is a shocking enough week for the beginning of the AI field

OpenAI revolutionizes the video production world

Google launches upgraded Gemini 1.5

ChatGPT can finally memorize

Cashing in on Sound with ElevenLabs

A New Way to Keep ChatGPT Conversations Going Without Losing Memory

Adobe has introduced a new AI assistant feature that enables searching and summarizing PDF document content.

Related posts

Hugging Face Launches Agent Intelligence Body Rankings: Who's the Leader in Tool Calling?

DeepSeek-R1 Official Cue Words and Parameter Configurations: Deploying Open Source 671B with DeepSeek's Official Performance

2 times No.1 on daily charts in 30 days, millions of Reddit exposures, the efficient cold start story of AI tools going overseas

Apple's AI Final Cut Pro 11 released: comes with AI features

No comments

Latest Collections

Latest Articles

Week in review: early 2024 is a shocking enough week for the beginning of the AI field

OpenAI revolutionizes the video production world

Google launches upgraded Gemini 1.5

ChatGPT can finally memorize

Cashing in on Sound with ElevenLabs

A New Way to Keep ChatGPT Conversations Going Without Losing Memory

Adobe has introduced a new AI assistant feature that enables searching and summarizing PDF document content.

Related posts

Hugging Face Launches Agent Intelligence Body Rankings: Who's the Leader in Tool Calling?

DeepSeek-R1 Official Cue Words and Parameter Configurations: Deploying Open Source 671B with DeepSeek's Official Performance

2 times No.1 on daily charts in 30 days, millions of Reddit exposures, the efficient cold start story of AI tools going overseas

Apple's AI Final Cut Pro 11 released: comes with AI features

No comments

Selected AI Tools

Latest Collections

Latest Articles