AI Personal Learning
and practical guidance

WeChat voice messages can be played like this? Even a beginner can use Devbox to easily realize public number voice to text!

Many people will want to use WeChat's voice input directly, it's always faster to speak than to type.

As opposed to the common .mp3 cap (a poem) .wav The format is different, WeChat voice input defaults to using the .amr Format.


Below is a webhook received by the developer server from WeChat, indicating that a voice message has come in from a user on the public number, and you can see the format as follows .amrThe

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

Many STT (Speech to Text) services only support the former, which gives rise to a requirement: how do we integrate the .amr format is converted to speech in the .mp3 Format?

 

prescription

At first, I wanted to use Laf solution, and later found out that Laf be located at function as a service solution that does not support the use of file systems such as fs Manipulate files on the server.

Then I saw a solution idea on GitHub[2]: Start a express service, using the fluent-ffmpeg commander-in-chief (military) .amr convert .mp3The file is then temporarily stored on the server for use by the caller.

This solution assumes that the server has been pre-installed with the FFmpegor else fluent-ffmpeg will not be available.

"This is not a simple function-as-a-service" I thought to myself. I'm a back-end and Ops guy myself, and I used to be all about Serverless, focusing on front-end interactions with users. Now this is a bit of a challenge for me.

However, I'm reminded of Sealos that went live a while back DevboxThe publicity seems to be about making up for it. Laf Such functions-as-a-service fall short.

"Couldn't I deploy the service as I described above on Devbox?" And so the remodeling (pit-stepping) begins.

 

Devbox Using Body Sense

What is Devbox can be seen in the introduction:Sealos Devbox Released: Deploying a Cloud Native Development Environment

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

I'd like to talk about its development experience in particular here.

 

Because it very much affects your understanding of Devbox The understanding and the detours I have taken, so this is a paragraph of usage tips or conclusions that I put here to discuss.

exist Devbox On top of that, all development processes aredirectThe

I was very uncomfortable with it at first, but it works just like our local IDE's Connect to SSH Host function, let's take a look at the Claude A note on this:

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

Devbox allows you to avoid Logging server IP or domain name / Enter the SSH port number / Enter the server username or password (or SSH key) The process is just one button press below.

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

I'm using Windsurf After the first click, my Windsurf tried to connect to the remote server. Any changes you make in the IDE after that are actually operating the remote server.

Including, running pnpm i Or other orders, all of them.

More importantly, all of your actions, including the running of commands, end up in the Devbox Web UI with a click on the release version When all (again, includingRunning of the command) is packaged into a docker image, which is equivalent to a snapshot of the current state of the virtual machine.

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

So all the dependencies and underlying software we installed on the Devbox development environment (such as theFFmpeg), in release version After that, we didn't have to retrace our steps at all on the production environment, but out of the box, all the pre-requisites were installed.

Once you understand this, you finally understand why Devbox The publicity has always said: finally we don't have to worry about dependency problems and version number conflicts in production environments.Because the production environment is a complete mirror of the development environment!

 

hands-on implementation

With the above understanding of the Devbox The hands-on realization part is relatively easy after the perception of the

 

1. Select express on Devbox

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

2. Open with IDE

As mentioned above, using VS Code / Cursor / Windsurf Connect to the server for development.

After connecting to the server, select all files Cmd + A, remove the default template file in its entirety.

 

3. Downloading code

We just chose express template, so the default git, node cap (a poem) pnpm All installed.

Now let's open a Terminal terminal and use git Download Code.

git clone https://github.com/yenche123/liubai.git

Here's a tip, if you type commands on the IDE of SSH Host, you will feel a little bit stuck with delay, that's because you are really operating the terminal on the remote server, of course there will be a delay between you.

The previous development experience was that you developed locally, then packaged the code and uploaded it to the server; using the Devbox Then everything is operated on the server, development-as-deployment.

 

4. Install FFmpeg

Continue typing the following command in Terminal to complete it FFmpeg Installation of.

sudo apt update && sudo apt upgrade # press Y to continue
sudo apt install ffmpeg # press Y to continue
ffmpeg -version # verify if installed successfully

The last line allows you to verify that the installation was successful.

Again, after this installation is complete, there is no need to retrace your steps on a production environment, as the software will be included directly in the image. It's pretty amazing, isn't it?

 

5. Writing entrypoint.sh

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

We are in the root directory, which is the same directory as the liubai/ The sibling position (shown above) creates a entrypoint.sh file, which reads as follows.

#!/bin/bash
cd /home/Devbox/project/liubai/liubai-backends/liubai-ffmpeg
pnpm dev

This file is to inform the server in the production environment how to start the service we desire after the machine starts.

Here we are informing the machine to locate the target folder first liubai-ffmpeg Then run the pnpm dev command to start the express Services.

 

6. Authorization of entrypoint.sh

Also in the root directory, we run the following command for the entrypoint.sh Add executable permissions.

chmod +x entrypoint.sh

 

7. Installation of dependencies

Let's open it. liubai-ffmpeg catalog to install the required dependencies:

cd /home/Devbox/project/liubai/liubai-backends/liubai-ffmpeg
pnpm i

 

8. Starting services on the development environment

Back on the root directory, let's simulate the startup of the service again:

cd /home/Devbox/project
bash entrypoint.sh

 

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

Seeing the print message as shown above means that we have started the development environment on the machine with the amr classifier for repeated actions mp3 The service!

 

Let's go back to Devbox On the web ui, copy the public address.

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

Then splice in the address bar of your browser /helloIf you can see the screen shown below, it means that the service given to us has been started successfully.

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

Now splicing. /new?url=your amr file&id=current timestamp in milliseconds

And you get a amr convert mp3 The service!

 

9. Deployment to production environment

We click on the Devbox web ui release versionThe

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

It is normal for your IDE to disconnect briefly during the release process.

After posting, we click put sth onlineJust click Finish according to the minimum configuration.

After waiting a few minutes, you have another public network accessible link, which is the production environment's amr convert mp3 Service!

 

git commit on Devbox

On Devbox, we may want to commit a git commit after development, which requires the remote server to have push access to the remote repository.

Here's an example of what I encountered, using GitHub as an example.

Run on Terminal git push origin Your branch name When you do, Terminal opens a GitHub page in your browser and asks you to enter the authorization code from the IDE, as shown in the next two images.

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

However, this operation may fail to authorize, and a notification will pop up in the lower right corner of the IDE, allowing you to use the Personal access tokens Perform the authorization as shown below:

WeChat voice messages can be played like this? Little white people can also use Devbox to easily realize public number voice to text! -1

 

After authorization is complete, then git push A little bit and it should be fine.

 

Experience it now

The service mentioned above has already been deployed on the "White Note", now follow the "White Note" WeChat public number, send voice to it, it will call the above mentioned amr classifier for repeated actions mp3 Services.

Of the 7 Tigers that are currently big model vendors, MiniMax is the one that allows developers to messages directly into the mp3 formalized base64.. See this, and don't you dare try it out and use multimodal in WeChat natively!

 

Synthesizing the above, we use the Devbox Finished a amr classifier for repeated actions mp3 The service.

The heart of this paper is Devbox Using Body Sense In that section, we briefly introduced the Devbox Differences from traditional development due to Connect to SSH Host capabilities, we directly operated the remote server on which we completed the installation of the software and the development of the core code.

by means of Devbox Optimization of the underlying container, which lets us have theDevelop-as-you-deployThe ability to validate the results directly on the development server after development; while the Devbox The ability to take a snapshot of the entire virtualizer solves the problem of development environment (computer) cap (a poem) production environment The problem of consistency is that we don't have to go through the process of installing dependencies and underlying software in production environments to achieve the out-of-the-box effect in production environments.

May not be reproduced without permission:Chief AI Sharing Circle " WeChat voice messages can be played like this? Even a beginner can use Devbox to easily realize public number voice to text!

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish