Gemini 2.0으로 음성을 멀티 스피커 자막으로 무료로 변환하세요!

58.3K 00

트랜스크립션된 자막은 화자 라벨과 두 번째 타임스탬프로 태그를 지정할 수 있으며 웃음소리와 벨소리를 정확하게 인식하고 노래를 정확하게 식별할 수 있습니다. 출력 대상 토큰 최대 길이는 약 15분 분량의 오디오를 전사할 수 있습니다. 이 작업은 Google AI 스튜디오 를 빠르게 시도하고 무료 쌍둥이자리 2.0 모델과 음성을 자막 텍스트로 변환하는 콜랩의 무료 코드입니다.

clue

生成该音频的转录文本。包括时间戳并标注讲话者。
讲话者包括：
- 小美
示例：
[00:00] Brady: Hello there.
[00:02] Tim: Hi Brady.
务必使用正确的讲话者姓名。使用你之前识别的姓名。如果确实无法确定讲话者姓名，则用字母代替，例如未知讲话者可标记为 'A'，另一个未知讲话者可标记为 'B'。
如果有音乐或短暂的音乐片段播放，请标注如下：
[01:02] [MUSIC] 或 [01:02] [JINGLE]
如果可以识别播放的音乐或片段名称，则使用该名称，例如：
[01:02] [Firework by Katy Perry] 或 [01:02] [The Sofa Shop jingle]
如果播放的是其他声音，请尝试标识该声音，例如：
[01:02] [Bell ringing]
每个字幕内容应尽量简短，最多几句简短的句子。
在节目结束时标注 [END]。
不要使用任何 Markdown 格式，例如加粗或斜体。
仅使用英文字母，除非你确信需要使用其他语言字符。
确保使用正确的单词并拼写准确。利用播客的上下文来帮助识别。
如果主持人讨论电影、书籍或名人，确保电影、书籍或名人的名称拼写正确。""")

콜랩 코드

%pip install google-genai jinja2

import os
from google import genai

# create client
api_key = os.getenv("GEMINI_API_KEY","xxx")
client = genai.Client(api_key=api_key)

from jinja2 import Template


# path to the file to upload
file_path = "../assests/porsche.mp3" # Repalce with your own file path

# Upload the file to the File API
file = client.files.upload(file=file_path)

# Generate a structured response using the Gemini API
prompt_template = Template("""Generate a transcript of the episode. Include timestamps and identify speakers.

Speakers are: 
{% for speaker in speakers %}- {{ speaker }}{% if not loop.last %}\n{% endif %}{% endfor %}

eg:
[00:00] Brady: Hello there.
[00:02] Tim: Hi Brady.

It is important to include the correct speaker names. Use the names you identified earlier. If you really don't know the speaker's name, identify them with a letter of the alphabet, eg there may be an unknown speaker 'A' and another unknown speaker 'B'.

If there is music or a short jingle playing, signify like so:
[01:02] [MUSIC] or [01:02] [JINGLE]

If you can identify the name of the music or jingle playing then use that instead, eg:
[01:02] [Firework by Katy Perry] or [01:02] [The Sofa Shop jingle]

If there is some other sound playing try to identify the sound, eg:
[01:02] [Bell ringing]

Each individual caption should be quite short, a few short sentences at most.

Signify the end of the episode with [END].

Don't use any markdown formatting, like bolding or italics.

Only use characters from the English alphabet, unless you genuinely believe foreign characters are correct.

It is important that you use the correct words and spell everything correctly. Use the context of the podcast to help.
If the hosts discuss something like a movie, book or celebrity, make sure the movie, book, or celebrity name is spelled correctly.""")

# Define the speakers and render the prompt
speakers = ["John"]
prompt = prompt_template.render(speakers=speakers)

response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[prompt, file],
)

print(response.text)