MegaPairs: BGE의 새로운 멀티모달 벡터 임베딩 모델

39.1K 00

일반 소개

MegaPairs는 대규모 데이터 합성 기법을 통해 이미지-텍스트 간 검색 작업을 위한 멀티모달 임베딩 모델을 생성하기 위한 VectorSpaceLab 팀의 오픈 소스 프로젝트로 GitHub에 공개되어 있습니다. 이 프로젝트는 2,600만 개 이상의 이기종 KNN 트라이어드 데이터 세트, 훈련된 BGE-VL 시리즈 모델, BGE-VL-CLIP(기본 및 대형 버전), BGE-VL-MLLM(S1 및 S2 버전)을 기반으로 합니다. 그 중 BGE-VL-MLLM-S1은 CIRCO 제로 샘플 이미지 검색 벤치마크(mAP@5)에서 8.1%의 성능을 향상시켰으며 MMEB 멀티모달 임베딩 벤치마크에서도 우수한 성능을 보였습니다. 코드와 모델은 GitHub와 Hugging Face에서 오픈 소스로 제공되었으며, 데이터 세트는 Recap-Datacomp(CC BY 4.0 라이선스)에서 제공된 데이터와 함께 MIT 라이선스에 따라 후속 릴리스가 예정되어 있습니다.

기능 목록

대규모 데이터 집합 생성멀티모달 임베딩 모델 학습을 위한 2,600만 개 이상의 이기종 KNN 트리플을 제공합니다.
BGE-VL-CLIP 임베딩 모델기본 및 대형 버전을 포함하고, 이미지와 텍스트의 임베디드 표현을 생성하며, 효율적인 검색을 지원합니다.
BGE-VL-MLLM 임베딩 모델S1 및 S2 버전을 사용할 수 있으며, 제로 샘플 검색을 지원하는 고성능 멀티모달 임베딩을 생성합니다.
제로 샘플 검색 지원교육 없이 임베딩을 생성하고 이미지-텍스트 검색 작업을 수행합니다.
모델 오픈 소스 및 확장허깅 페이스에서 사전 학습된 모델을 제공하여 다운로드, 사용 및 미세 조정을 지원합니다.

도움말 사용

메가페어는 코드와 모델을 깃허브와 허깅 페이스를 통해 배포하여 사용자가 멀티모달 임베딩을 빠르게 생성하고 검색 작업을 완료할 수 있도록 지원합니다. 아래는 BGE-VL-MLLM-S1(허깅 페이스)에 대한 공식 지침을 기반으로 한 자세한 사용 방법 가이드입니다.

획득 및 설치

GitHub 리포지토리에 액세스하기열기 https://github.com/VectorSpaceLab/MegaPairs를 클릭하고 프로젝트 세부 정보를 확인합니다.
클론 창고터미널에서 다음 명령을 실행하여 코드를 다운로드합니다:

git clone https://github.com/VectorSpaceLab/MegaPairs.git
cd MegaPairs

종속성 설치파이썬 3.10을 사용하여 가상 환경을 만들고 필요한 라이브러리를 설치합니다:

python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
pip install torch transformers==4.41.2 sentencepiece

얼굴 안아주기 요청 transformers==4.41.2 노래로 응답 sentencepiece.
4. 모델 다운로드: 허깅 페이스에서 BGE-VL-MLLM-S1을 가져옵니다:

https://huggingface.co/BAAI/BGE-VL-MLLM-S1 방문하기
Python 스크립트를 통한 자동 다운로드(아래 참조).

주요 기능 사용

1. 데이터 세트 사용

멀티모달 임베딩 모델 학습을 위한 2,600만 개의 트리플이 포함된 메가페어 데이터 세트는 아직 완전히 공개되지 않았으며, 다음 링크를 통해 공개될 예정입니다. 포옹하는 얼굴 Offer.

획득 방법공식 업데이트를 주시하고 다운로드하여 모델 훈련 또는 유효성 검사에 사용하세요.
데이터 형식임베딩 생성 및 검색을 지원하는 삼항식(쿼리 이미지, 텍스트 설명, 대상 이미지)입니다.

2. 멀티모달 임베딩 생성(BGE-VL-MLLM-S1)

BGE-VL-MLLM-S1은 이미지와 텍스트의 임베디드 표현을 생성하고 검색을 완료하기 위한 핵심 임베딩 모델입니다. 다음은 공식 코드입니다:

모델 로드:

import torch
from transformers import AutoModel, AutoProcessor
model_name = "BAAI/BGE-VL-MLLM-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.eval()
model.cuda()  # 使用 GPU 加速

임베딩 생성 및 검색:

from PIL import Image
# 准备输入
query_image = Image.open("./cir_query.png").convert("RGB")
query_text = "Make the background dark, as if the camera has taken the photo at night"
candidate_images = [Image.open("./cir_candi_1.png").convert("RGB"), Image.open("./cir_candi_2.png").convert("RGB")]
# 处理查询数据
query_inputs = processor(
text=query_text,
images=query_image,
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: ",
return_tensors="pt",
q_or_c="q"
)
query_inputs = {k: v.cuda() for k, v in query_inputs.items()}
# 处理候选数据
candidate_inputs = processor(
images=candidate_images,
return_tensors="pt",
q_or_c="c"
)
candidate_inputs = {k: v.cuda() for k, v in candidate_inputs.items()}
# 生成嵌入并计算相似度
with torch.no_grad():
query_embs = model(**query_inputs, output_hidden_states=True).hidden_states[-1][:, -1, :]
candi_embs = model(**candidate_inputs, output_hidden_states=True).hidden_states[-1][:, -1, :]
query_embs = torch.nn.functional.normalize(query_embs, dim=-1)
candi_embs = torch.nn.functional.normalize(candi_embs, dim=-1)
scores = torch.matmul(query_embs, candi_embs.T)
print(scores)  # 输出相似度得分

결과 해석: scores 는 쿼리 임베딩과 후보 임베딩 간의 유사성을 나타내며, 점수가 높을수록 일치 가능성이 높습니다.

BGE-VL-CLIP으로 임베딩 생성하기 3.

BGE-VL-CLIP(기본/대형)은 멀티모달 임베딩도 생성할 수 있습니다:

로드 및 실행:

from transformers import AutoModel
model_name = "BAAI/BGE-VL-base"
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.set_processor(model_name)
model.eval()
with torch.no_grad():
query = model.encode(images="./cir_query.png", text="Make the background dark")
candidates = model.encode(images=["./cir_candi_1.png", "./cir_candi_2.png"])
scores = query @ candidates.T
print(scores)

4. 모델 미세 조정

사용자는 데이터 세트를 사용하여 모델을 미세 조정할 수 있습니다:

데이터 준비이미지-텍스트 쌍 또는 트리플을 준비합니다.
미세 조정 프로세스미세 조정된 코드 공개 예정, 다음에서 확인 가능 transformers (명목식 형태로 사용됨) Trainer API.
검증(이론)CIRCO 또는 MMEB 벤치마크를 사용하여 효과를 테스트합니다.