AI Personal Learning
and practical guidance
豆包Marscode1

Aana SDK: An Open Source Tool for Easy Deployment of Multimodal AI Models

General Introduction

Aana SDK is an open source framework developed by Mobius Labs, named after the Malayalam word ആന (elephant), that helps developers rapidly deploy multimodal AI models that support processing a wide range of data, including text, images, audio, and video. It helps developers quickly deploy and manage multimodal AI models that support processing text, images, audio, and video, etc. Based on the Ray Distributed Computing Framework, the Aana SDK is designed for reliability, scalability, and efficiency. Developers can use it to easily build applications from standalone to clustered, such as video transcription, image description, or smart chat tools.

Aana SDK:简易部署多模态AI模型的开源工具-1


 

Function List

  • Supports multimodal data: can process text, images, audio and video simultaneously.
  • Model Deployment and Scaling: Machine learning models can be deployed on a single machine or on a cluster.
  • Auto-generated APIs: Automatically create and validate APIs based on defined endpoints.
  • Real-time streaming output: supports streaming results for real-time applications and large language models.
  • Predefined data types: Built-in support for common data types such as image, video, etc.
  • Background task queue: endpoint tasks run automatically in the background without additional configuration.
  • Integration of multiple models: Whisper, vLLM, Hugging Face Transformers, etc. are supported.
  • Documentation Auto-Generation: Automatically generate application documentation based on endpoints.

 

Using Help

Installation process

There are two ways to install Aana SDK: PyPI and GitHub, here are the steps:

  1. Preparing the environment
  2. Installation via PyPI
    • Run the following command to install the core dependencies:
      pip install aana
      
    • For full functionality, install all additional dependencies:
      pip install aana[all]
      
    • Other options include vllm(Language Modeling),asr(speech recognition),transformers(converter model), etc., as required.
  3. Installation via GitHub
    • Cloning Warehouse:
      git clone https://github.com/mobiusml/aana_sdk.git
      cd aana_sdk
      
    • Install using Poetry (Poetry >= 2.0 recommended, see https://python-poetry.org/docs/#installation):
      poetry install --extras all
      
    • Development environments can add test dependencies:
      poetry install --extras all --with dev,tests
      
  4. Verify Installation
    • importation python -c "import aana; print(aana.__version__)", if the version number is returned then it succeeds.

How to use

At the heart of the Aana SDK are Deployments and Endpoints. Deployments load the model and Endpoints define the functionality. The following is an example of video transcription:

  1. Creating a new application
  2. Configuration Deployment
    • exist app.py Loading Whisper Model:
      from aana.sdk import AanaSDK
      from aana.deployments.whisper_deployment import WhisperDeployment, WhisperConfig, WhisperModelSize, WhisperComputeType
      app = AanaSDK(name="video_app")
      app.register_deployment(
      "whisper",
      WhisperDeployment.options(
      num_replicas=1,
      ray_actor_options={"num_gpus": 0.25},  # 若无GPU可删除此行
      user_config=WhisperConfig(
      model_size=WhisperModelSize.MEDIUM,
      compute_type=WhisperComputeType.FLOAT16
      ).model_dump(mode="json")
      )
      )
      
  3. Defining Endpoints
    • Add transcription endpoints:
      from aana.core.models.video import VideoInput
      @app.aana_endpoint(name="transcribe_video")
      async def transcribe_video(self, video: VideoInput):
      audio = await self.download(video.url)  # 下载并提取音频
      transcription = await self.whisper.transcribe(audio)  # 转录
      return {"transcription": transcription}
      
  4. Running the application
    • Runs in the terminal:
      python app.py serve
      
    • or with the Aana CLI:
      aana deploy app:app --host 127.0.0.1 --port 8000
      
    • When the application starts, the default address is http://127.0.0.1:8000The
  5. test function
    • Sends a request using cURL:
      curl -X POST http://127.0.0.1:8000/transcribe_video -F body='{"url":"https://www.youtube.com/watch?v=VhJFyyukAzA"}'
      
    • Or visit the Swagger UI (http://127.0.0.1:8000/docs) Testing.

Featured Function Operation

  • multimodal processing
    In addition to speech transcription, image models (e.g. Blip2) can be integrated to generate descriptions:

    captions = await self.blip2.generate_captions(video.frames)
  • streaming output
    Supports real-time return of results, for example:

    @app.aana_endpoint(name="chat", streaming=True)
    async def chat(self, question: str):
    async for chunk in self.llm.generate_stream(question):
    yield chunk
    
  • Cluster Extension
    To deploy with a Ray cluster, modify the app.connect() Just specify the cluster address.

Additional tools

  • Ray Dashboard: Post-run access http://127.0.0.1:8265, view cluster status and logs.
  • Docker Deployment: See https://mobiusml.github.io/aana_sdk/pages/docker/.

 

application scenario

  1. Video Content Organization
    Generate subtitles and summaries for instructional videos for easy archiving and searching.
  2. intelligent question and answer system (Q&A)
    The user uploads a video and then asks a question, and the system answers based on the audio and video content.
  3. Enterprise Data Analytics
    Extract key information from meeting recordings and videos to generate reports.

 

QA

  1. Need a GPU?
    It's not mandatory; the CPU can run it too, but the GPU (40GB of video memory recommended) will increase efficiency significantly.
  2. How do I handle installation errors?
    Check that the Python version and dependencies match, and add the --log-level DEBUG View detailed logs.
  3. What language models are supported?
    Built-in vLLM, Whisper, and more, with the ability to integrate more Hugging Face models through Transformers.
May not be reproduced without permission:Chief AI Sharing Circle " Aana SDK: An Open Source Tool for Easy Deployment of Multimodal AI Models
en_USEnglish