Do you often need to transcribe meeting recordings or interviews into text? Since writing verbatim scripts is time-consuming and laborious, it's a good idea to utilize AI tools to convert audio recordings into text. In this article, we'll introduce Whisper, an automatic speech recognition (ASR) system from the OpenAI team. According to OpenAI's description on Github, Whisper is an open-source speech recognition model that currently recognizes about 96 languages around the world and converts them into text. In terms of Chinese recognition accuracy, Whisper has reached a high level. As a result of the Whisper It's open source technology, so all users need is a Google account and a command code to set it up. Once downloaded and installed on your computer, you can use Whisper to perform speech recognition and transcription tasks free of charge and without developer restrictions.
Whisper installation code:
!pip install git+https://github.com/openai/whisper.git
Ffmpeg installation code:
!sudo apt update && sudo apt install ffmpeg
Speech-to-text execution code:
!whisper "filename (needs replacing).mp3" --model medium
Step 1: Sign in to your Google account, open Google Drive, click "+New" in the upper left corner, scroll down to find More, and then click "Connect More Apps".
Step 2: The first time you do this, the Google Workspace app marketplace will open, type in "Google Colaboratory" in the search bar and select it.
Step 3: Click "Install" to install and select "Continue" to continue. You will be asked to sign in with your Google account and follow the instructions to complete the installation.
Step 4: Go back to Google Drive home page, click on "+New" in the upper left corner again, and select "Google Colaboratory" app in more options.
Step 5: Once opened, you can change the name of the file so that you can quickly find and reuse it later.
Step 6: Click "Execution Phase" in the upper column and select "Change Execution Phase Type".
Step 7: At this point, you can select different run types and compute resources. Please select "Python 3" and "T4 GPU" and click "Save".
Step 8: Find the word "Connect" in the upper right corner of the window, click on it and wait for the connection to be successful.
Step 9: Once the connection is complete, you can see the computer's parameters, including GPU, memory, and hard disk information.
Step 10: Next, to install Whisper, enter the Whisper installation code and the ffmpeg installation code on the first and second lines of the center bar, respectively, and click Run.
Step 11: After the installation is completed, click the folder icon on the left side and select "Upload Files" to upload the MP3 files you need to transcribe.