Many people will want to use WeChat's voice input directly, it's always faster to speak than to type.
As opposed to the common .mp3
cap (a poem) .wav
The format is different, WeChat voice input defaults to using the .amr
Format.
Below is a webhook received by the developer server from WeChat, indicating that a voice message has come in from a user on the public number, and you can see the format as follows .amr
The
Many STT (Speech to Text) services only support the former, which gives rise to a requirement: how do we integrate the .amr
format is converted to speech in the .mp3
Format?
prescription
At first, I wanted to use Laf
solution, and later found out that Laf
be located at function as a service solution that does not support the use of file systems such as fs
Manipulate files on the server.
Then I saw a solution idea on GitHub[2]: Start a express
service, using the fluent-ffmpeg
commander-in-chief (military) .amr
convert .mp3
The file is then temporarily stored on the server for use by the caller.
This solution assumes that the server has been pre-installed with the FFmpeg
or else fluent-ffmpeg
will not be available.
"This is not a simple function-as-a-service" I thought to myself. I'm a back-end and Ops guy myself, and I used to be all about Serverless, focusing on front-end interactions with users. Now this is a bit of a challenge for me.
However, I'm reminded of Sealos that went live a while back Devbox
The publicity seems to be about making up for it. Laf
Such functions-as-a-service fall short.
"Couldn't I deploy the service as I described above on Devbox?" And so the remodeling (pit-stepping) begins.
Devbox Using Body Sense
What is Devbox can be seen in the introduction:Sealos Devbox Released: Deploying a Cloud Native Development Environment
I'd like to talk about its development experience in particular here.
Because it very much affects your understanding of Devbox
The understanding and the detours I have taken, so this is a paragraph of usage tips or conclusions that I put here to discuss.
exist Devbox
On top of that, all development processes aredirectThe
I was very uncomfortable with it at first, but it works just like our local IDE's Connect to SSH Host
function, let's take a look at the Claude A note on this:
Devbox allows you to avoid Logging server IP or domain name
/ Enter the SSH port number
/ Enter the server username or password (or SSH key)
The process is just one button press below.
I'm using Windsurf After the first click, my Windsurf tried to connect to the remote server. Any changes you make in the IDE after that are actually operating the remote server.
Including, running pnpm i
Or other orders, all of them.
More importantly, all of your actions, including the running of commands, end up in the Devbox Web UI with a click on the release version
When all (again, includingRunning of the command
) is packaged into a docker image, which is equivalent to a snapshot of the current state of the virtual machine.
So all the dependencies and underlying software we installed on the Devbox development environment (such as theFFmpeg
), in release version
After that, we didn't have to retrace our steps at all on the production environment, but out of the box, all the pre-requisites were installed.
Once you understand this, you finally understand why Devbox
The publicity has always said: finally we don't have to worry about dependency problems and version number conflicts in production environments.Because the production environment is a complete mirror of the development environment!
hands-on implementation
With the above understanding of the Devbox
The hands-on realization part is relatively easy after the perception of the
1. Select express on Devbox
2. Open with IDE
As mentioned above, using VS Code
/ Cursor
/ Windsurf
Connect to the server for development.
After connecting to the server, select all files Cmd + A
, remove the default template file in its entirety.
3. Downloading code
We just chose express
template, so the default git
, node
cap (a poem) pnpm
All installed.
Now let's open a Terminal terminal and use git
Download Code.
git clone https://github.com/yenche123/liubai.git
Here's a tip, if you type commands on the IDE of SSH Host, you will feel a little bit stuck with delay, that's because you are really operating the terminal on the remote server, of course there will be a delay between you.
The previous development experience was that you developed locally, then packaged the code and uploaded it to the server; using the Devbox
Then everything is operated on the server, development-as-deployment.
4. Install FFmpeg
Continue typing the following command in Terminal to complete it FFmpeg
Installation of.
sudo apt update && sudo apt upgrade # press Y to continue sudo apt install ffmpeg # press Y to continue ffmpeg -version # verify if installed successfully
The last line allows you to verify that the installation was successful.
Again, after this installation is complete, there is no need to retrace your steps on a production environment, as the software will be included directly in the image. It's pretty amazing, isn't it?
5. Writing entrypoint.sh
We are in the root directory, which is the same directory as the liubai/
The sibling position (shown above) creates a entrypoint.sh
file, which reads as follows.
#!/bin/bash cd /home/Devbox/project/liubai/liubai-backends/liubai-ffmpeg pnpm dev
This file is to inform the server in the production environment how to start the service we desire after the machine starts.
Here we are informing the machine to locate the target folder first liubai-ffmpeg
Then run the pnpm dev
command to start the express
Services.
6. Authorization of entrypoint.sh
Also in the root directory, we run the following command for the entrypoint.sh
Add executable permissions.
chmod +x entrypoint.sh
7. Installation of dependencies
Let's open it. liubai-ffmpeg
catalog to install the required dependencies:
cd /home/Devbox/project/liubai/liubai-backends/liubai-ffmpeg pnpm i
8. Starting services on the development environment
Back on the root directory, let's simulate the startup of the service again:
cd /home/Devbox/project bash entrypoint.sh
Seeing the print message as shown above means that we have started the development environment on the machine with the amr
classifier for repeated actions mp3
The service!
Let's go back to Devbox
On the web ui, copy the public address.
Then splice in the address bar of your browser /hello
If you can see the screen shown below, it means that the service given to us has been started successfully.
Now splicing. /new?url=your amr file&id=current timestamp in milliseconds
And you get a amr
convert mp3
The service!
9. Deployment to production environment
We click on the Devbox web ui release version
The
It is normal for your IDE to disconnect briefly during the release process.
After posting, we click put sth online
Just click Finish according to the minimum configuration.
After waiting a few minutes, you have another public network accessible link, which is the production environment's amr
convert mp3
Service!
git commit on Devbox
On Devbox, we may want to commit a git commit after development, which requires the remote server to have push access to the remote repository.
Here's an example of what I encountered, using GitHub as an example.
Run on Terminal git push origin Your branch name
When you do, Terminal opens a GitHub page in your browser and asks you to enter the authorization code from the IDE, as shown in the next two images.
However, this operation may fail to authorize, and a notification will pop up in the lower right corner of the IDE, allowing you to use the Personal access tokens
Perform the authorization as shown below:
After authorization is complete, then git push
A little bit and it should be fine.
Experience it now
The service mentioned above has already been deployed on the "White Note", now follow the "White Note" WeChat public number, send voice to it, it will call the above mentioned amr
classifier for repeated actions mp3
Services.
Of the 7 Tigers that are currently big model vendors, MiniMax is the one that allows developers to messages
directly into the mp3
formalized base64
.. See this, and don't you dare try it out and use multimodal in WeChat natively!
Synthesizing the above, we use the Devbox
Finished a amr
classifier for repeated actions mp3
The service.
The heart of this paper is Devbox Using Body Sense In that section, we briefly introduced the Devbox
Differences from traditional development due to Connect to SSH Host
capabilities, we directly operated the remote server on which we completed the installation of the software and the development of the core code.
by means of Devbox
Optimization of the underlying container, which lets us have theDevelop-as-you-deployThe ability to validate the results directly on the development server after development; while the Devbox
The ability to take a snapshot of the entire virtualizer solves the problem of development environment (computer)
cap (a poem) production environment
The problem of consistency is that we don't have to go through the process of installing dependencies and underlying software in production environments to achieve the out-of-the-box effect in production environments.