The most popular AI product of 2024 will be the NotebookLM .. It's been a hit since September, and it's been hot all the way through the end of the year.
In December, NotebookLM was updated with a new feature: join. Users can now be a part of the podcast program as well.
This feature isn't new, the NotebookLM team showed it off at the Google Developer Conference a long time ago, but it's only recently that it's finally available in BETA.
Attention:
Strong restrictions apply to users in some areas, so check your network settings!
The "Join" function is unstable, so be patient.
The "Join" function currently only supports English speech, but the language of uploaded text is not limited.
Currently web-only, no mobile
Not only do you generate podcasts with one click, but you can also join the
NotebookLM's first focus was on smart notes, where an overview summary is automatically generated after uploading a file. Users can ask questions directly in text in a dialog box based on the content of the uploaded text.
This is actually a nice feature, but of course, it really comes out of the ring with the podcast form of the conversation, which is the audio overview. The audio overview now also offers a 'join' feature, which you can click to join this conversation directly.
Questioning is the highlight of this product. Anyone who has been to a class knows how deep this goes: not only does questioning require you to know the content, but more importantly, in order to participate and join in, you have to actively follow the pace of the lecture and use your brain. This effectively promotes understanding of the text and material.
NotebookLM's "Join" is like the hand-raising function in Tencent's conference. After clicking and speaking the question directly, with a delay of about a second and a half, the AI anchor will respond, saying something like "Our listeners have something to say" as a transition.
It looks like it takes some response time so far, but picks up the conversation very naturally. The language is only supported in English, so you need to ask questions in English, but even if you ask questions in Chinese, it's very friendly to thank you for your participation without letting the words fall on the floor.
NotebookLM's support for long texts is amazing, and large sections such as War and Peace can be uploaded. Although from the generated audio, you can sense that only some chapters were excerpted for analysis, and the total length is only 11 minutes.
It's understandable. The whole book could take hours.
The Chinese version of War and Peace was used for the test, and the returned audio overview was in English, but at the beginning, the two "hosts" made a point of pointing out that it was a Chinese translation, and stated thatFrom different languages, it can provide different perspectives on the story-Very true!
When asked exactly which chapters were used for analysis, the opposite side was a bit ambiguous, stating that he had chosen some key episodes. The audio is also based on following the order of the characters rather than the storyline.
However, the analysis can be customized by clicking on Customize below and entering your requirements before generating the audio overview. For example, after uploading another novel, I asked for the plot of the story development to be the main focus, and the subsequent audio generated was in the exact order of the story plot.
In the question session, I found that there are some problems with its voice recognition. For example, in the following question, originally my question was how the characters in this novel "NOVEL" were portrayed, but it was recognized as how the characters in the noble "NOBEL" were portrayed?
There was no checking with me either, a lapse that continues the usual style of the big models: a serious run on the wrong side of the fence. It was clear that the understanding was off, but the discussion continued in earnest and on point.
Another problem is the design of the interaction, NotebookLM has a good intention: each project can hold more than one material, so that different materials can be combined to generate notes. However, the interface design is not clear enough, even a legend like "Go back to all items" would be much better.
Another interaction that is not working well is that after entering the interactive mode, theThere is no progress bar for audio.It's hard to tell where the program is going, and you can't rewind to listen to a question after it's finished, and the question itself isn't included in the audio file. I can only say that this is still a BETA version, and I am looking forward to subsequent upgrades.
Longer articles seem to work best so far. The generation time is friendlier and you can read the entire text. For example, a large book like War and Peace, while it can be passed in, the generation took a really long time and at one point I thought it was stuck.
And not only do the models eat up the long articles, but they are also best able to take advantage of this interaction.
For the long article section, I ran a 2-3,000 word article on the topic of discussing potential problems with AI chatbots. The full audio is 22 minutes long, but that's with several questions already included.
Anything shorter may not be very informative, and anything longer will inevitably make people impatient. 20 minutes with interaction is arguably a more appropriate length.
I have to say that the naturalness of this interaction is still amazing. Not only in the voice but also in the content, the two 'anchors' understood the questions very accurately and comprehensively.
However, the fact that these questions are in the original article, there are no direct answers. If the questions were asked in response to the original content, it would be clearer if they were phrased explicitly as 'in this article'.
But it is the play beyond the original text that demonstrates the strength of the model behind it: on the one hand, the model has to be able to understand the question, and secondly determine whether it is supported by the original text, and in the absence of that, generate an appropriate response and translate it into speech, and also package it into a natural and smooth voice interaction.
It's hard to say if chatbots are that emulated, I see these two anchors as really quite strong.
How does painless learning work?
Raiza Martin, Product Manager at NotebookLM, said in an interview that she's a little surprised at how popular it's become. At first, it wasn't a tool built for the general public, but more for reading enthusiasts.
There's some real science here.
A recent study published in the medical journal Neuroimaging may explain why it's hot:People who love to read happen to be more sensitive to sound as well.
You may find it a bit confusing: shouldn't reading be about 'seeing', about visualizing?
Yes, but not all of it. The ability to read has to do with the anterior part of the temporal lobe in the left hemisphere of the brain, and that part also processes sounds. Awareness of processing speech sounds comes as early as childhood, when we learn the sounds of language and then pair words with speech.
After testing more than 1,000 volunteers, researchers found that one of the brain circuits in the left hemisphere increased in thickness the better the speaking and reading skills. This brain circuit, in turn, contains the auditory cortex.
That means a thicker auditory cortex is associated with more skillful reading. This is not entirely innate; our brains are constantly being altered by our environment.The more the act of reading, the more it slowly changes the shape of the brain's cerebral cortex.
Of course, the fun and lively nature of podcasting as a format is an important reason why NotebookLM combines the two modalities of sound and text without being obtuse and popular and easy to understand. In addition to using it to analyze text, netizens have developed all sorts of amazing uses for NotebookLM: it's used to change resumes, evaluate each other's papers, and simulate group work discussions. Reading is simply the most trivial part of the process. However, at this stage, NotebookLM still has a lot of room for improvement, for example, the voice still lags sometimes, the generation time is long, and it fails to load from time to time. I hope it doesn't fail to live up to everyone's expectations and urgently optimize it.