GLM-4V Series
The GLM-4V series contains 3 models for different application scenarios.
- GLM-4V-Plus: With excellent multimodal comprehension capability, it can process up to 5 images simultaneously and supports video content comprehension, which is suitable for complex multimedia analysis scenarios.
- GLM-4V: Focusing on image content understanding, it is capable of parsing up to 5 images simultaneously, and is suitable for occasions where deep image understanding is required.
- GLM-4V-Flash (Free): Focuses on efficient single-image understanding for scenarios with fast image parsing, such as real-time image analysis or batch image processing.
synchronous call
interface request
typology |
clarification |
transmission method |
https |
request address |
https://open.bigmodel.cn/api/paas/v4/chat/completions |
invocation method |
Synchronized call that waits for the model execution to complete and return the final result or SSE call |
character encoding |
UTF-8 |
Interface Request Format |
JSON |
response format |
JSON or Standard Stream Event |
Interface request type |
POST |
development language |
Any development language that can initiate http requests |
Request Parameters
Parameter name |
typology |
mandatory field |
Parameter description |
model |
String |
be |
The model encoding of the call. Model encoding: glm-4v-plus , glm-4v, glm-4v-flash (free) |
messages |
List |
be |
When the language model is called, a list of current dialog messages is input to the model as prompts, passing parameters in the form of json arrays. For example, the video comprehension parameter:{ "role": "user", "content": [ { "type": "video_url", "video_url": { "url" : "https://xxx/xx.mp4" } }, { "type": "text", "text": "请仔细描述这个视频" } ] } Picture comprehension parameters:{ "role": "user", "content": [ { "type": "image_url", "image_url": { "url" : "https://xxx/xx.jpg" } }, { "type": "text", "text": "解释一下图中的现象" } ] } Possible message types include User message, Assistant message. see the message message field descriptions below. |
request_id |
String |
clogged |
Passed by the user, need to ensure the uniqueness; used to distinguish between each request for a unique identity, the user side does not pass the platform will be generated by default. |
do_sample |
Boolean |
clogged |
Sampling strategy is enabled when do_sample is true, and sampling strategies temperature and top_p will not take effect when do_sample is false. |
stream |
Boolean |
clogged |
This parameter should be set to Fasle or omitted when using synchronized calls. If set to True, the model will return the model generated content one by one through the standard Event Stream. If set to True, the model will return the model generated content in a standard Event Stream, one block at a time, and at the end of the Event Stream, it will return andata: [DONE] Message. |
temperature |
Float |
clogged |
Sample temperature, control the randomness of the output, must be positive The range of values is:[0.0,1.0] The default value is 0.8, the larger the value, the more random and creative the output will be; the smaller the value, the more stable or certain the output will be. It is recommended to adjust the value according to the application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time |
top_p |
Float |
clogged |
Another method of sampling with temperature, called kernel sampling The range of values is:[0.0, 1.0] The default value is 0.6 The model considers models with a top_p probability mass tokens For example, 0.1 means that the model decoder only considers tokens from the candidate set of probabilities from the former 10%. It is recommended that you adjust the model decoder according to your application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time |
max_tokens |
Integer |
clogged |
Maximum model output tokens, default is 1024, maximum support 1024tokens. |
user_id |
String |
clogged |
Unique ID of the end-user to assist the platform to intervene in end-user violations, generation of illegal and undesirable information or other abusive behaviors.ID length requirement: minimum 6 characters, maximum 128 characters. |
Messages format
Acceptable message types for the model include User message, Assistant message, and the format varies from one message type to another. The details are as follows:
User message
Parameter name |
typology |
mandatory field |
Parameter description |
role |
String |
be |
The role information for the message should beuser |
content |
List |
be |
Message content. |
type |
String |
be |
Text type: text Image type: image_url Video type: video_url Video and image types cannot be entered at the same time |
text |
String |
be |
Supplemental when type is text |
image_url |
Object |
be |
When type is image_url, add |
url |
String |
be |
Image url or base64 encoding. Image size upload limit is less than 5M per image and not more than 6000*6000 pixels. jpg, png, jpeg formats are supported. Note: GLM-4V-Flash does not support base64 encoding. |
video_url |
Object |
be |
When type is video_url, the video_url parameter must be the first when only glm-4v-plus supports video input video comprehension. |
url |
String |
be |
Video url. video size is only supported up to 20M, video duration is not more than 30s. supported video type mp4 . |
Assistant message
Parameter name |
typology |
mandatory field |
Parameter description |
role |
String |
be |
The role information for the message should beassistant |
content |
String |
be |
Message |
response parameter
Parameter name |
typology |
Parameter description |
id |
String |
Task ID |
created |
Long |
The request creation time, which is a Unix timestamp in seconds. |
model |
String |
Model name |
choices |
List |
Model output content of the current dialog |
index |
Integer |
Result subscripts |
finish_reason |
String |
Modeling Reasons for Reasoning About Termination.stop Represents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies. |
message |
Object |
Text message returned by the model |
role |
String |
The role of the current dialog, currently defaulting to assistant (model) |
content |
List |
Content of the current dialogue |
usage |
Object |
Returns a count of the number of tokens for this model call at the end of the session |
prompt_tokens |
Integer |
Number of tokens entered by the user |
completion_tokens |
Integer |
Number of tokens output by the model |
total_tokens |
Integer |
Total tokens |
content_filter |
List |
Back to information about content security |
role |
String |
Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context |
level |
Integer |
Severity level 0-3, with level 0 being the most severe and 3 being the least severe. |
Example of a request
Upload Video URL
#视频理解示例、上传视频URL
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url" : "https://sfile.chatglm.cn/testpath/video/xxxxx.mp4"
}
},
{
"type": "text",
"text": "请仔细描述这个视频"
}
]
}
]
)
print(response.choices[0].message)
Upload video Base64
import base64
from zhipuai import ZhipuAI
video_path = "/Users/YourCompluter/xxxx.mp4"
with open(video_path, 'rb') as video_file:
video_base = base64.b64encode(video_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url" : video_base
}
},
{
"type": "text",
"text": "请仔细描述这个视频"
}
]
}
]
)
print(response.choices[0].message)
Upload Image URL
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v", # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "图里有什么"
},
{
"type": "image_url",
"image_url": {
"url" : "https://img1.baidu.com/it/u=1369931113,3388870256&fm=253&app=138&size=w931&n=0&f=JPEG&fmt=auto?sec=1703696400&t=f3028c7a1dca43a080aeb8239f09cc2f"
}
}
]
}
]
)
print(response.choices[0].message)
Upload image Base64
import base64
from zhipuai import ZhipuAI
img_path = "/Users/YourCompluter/xxxx.jpeg"
with open(img_path, 'rb') as img_file:
img_base = base64.b64encode(img_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": img_base
}
},
{
"type": "text",
"text": "请描述这个图片"
}
]
}
]
)
print(response.choices[0].message)
Multiple rounds of picture dialog
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # 填写需要调用的模型名称
messages=[
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxx.png"
},
"type": "image_url"
},
{
"text": "图中有什么",
"type": "text"
}
],
"role": "user"
},
{
"content": [
{
"text": "这是一幅描绘自然风景的画,展现了一片宁静的湖泊,湖水呈现出深浅不一的蓝绿色调。湖边长着一棵盛开的樱花树,粉白色的花朵在阳光下显得格外鲜艳。树下有一位身穿蓝色衣服的人坐在船上,似乎正在享受这美好的时光。\n\n背景是连绵起伏的山脉,山体被绿色的植被覆盖,山顶则被云雾缭绕,增添了几分神秘感。远处还可以看到一座小亭子,它坐落在湖边的岩石上,与周围的自然环境和谐相融。\n\n整个画面给人一种宁静、和谐的感觉,仿佛置身于一个世外桃源之中。",
"type": "text"
}
],
"role": "assistant"
},
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "这个图与上面图有什么不一样",
"type": "text"
}
],
"role": "user"
},
{
"content": [
{
"text": "第一张图和第二张图在画面内容、构图、色彩等方面都有很大的不同,形成鲜明的对比。\n\n1.第一张图是风景画,主要描绘了自然风景,如樱花树、湖泊、山脉等,色彩丰富,充满春天的气息,而第二张图是描绘长城在冬季的风景照片,长城在白雪覆盖的山脉中蜿蜒盘旋,整体色调偏冷,给人一种寒冷、宁静的感觉。\n\n2.构图上,第一张图采用对称式构图,以樱花树为中心,两侧的景物相互呼应,形成平衡和谐的画面效果,第二张图则采用对角线构图,长城作为视觉引导线,引导观众的视线穿越整个画面,增加画面的深度和动态感。\n\n整体来看,这两张图在画面内容、色彩、构图和氛围上都呈现出明显的对比,一张是充满生机和色彩的春日风景画,另一张是宁静而神秘的冬日长城风景照。",
"type": "text"
}
],
"role": "assistant"
},
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "这个图与上一张图有什么区别",
"type": "text"
}
],
"role": "user"
}
]
)
print(response.choices[0].message)
Response Example
{
"created": 1703487403,
"id": "8239375684858666781",
"model": "glm-4v-plus",
"request_id": "8239375684858666781",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "图中有一片蓝色的海和蓝天,天空中有白色的云朵。图片的右下角有一个小岛或者岩石,上面长着深绿色的树木。",
"role": "assistant"
}
}
],
"usage": {
"completion_tokens": 37,
"prompt_tokens": 1037,
"total_tokens": 1074
}
}
streaming output
response parameter
Parameter name |
typology |
Parameter description |
id |
String |
Task ID |
created |
Long |
The request creation time, which is a Unix timestamp in seconds. |
choices |
List |
Model output content of the current dialog |
index |
Integer |
Result subscripts |
finish_reason |
String |
Modeling Reasons for Reasoning About Termination.stop Represents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies. |
delta |
Object |
Text message returned by the model increment |
role |
String |
The role of the current dialog, currently defaulting to assistant (model) |
content |
String |
Content of the current dialogue |
usage |
Object |
Statistics on the number of tokens called by this model |
prompt_tokens |
Integer |
Number of tokens entered by the user |
completion_tokens |
Integer |
Number of tokens output by the model |
total_tokens |
Integer |
Total tokens |
content_filter |
List |
Back to information about content security |
role |
String |
Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context |
level |
Integer |
Severity level 0-3, with level 0 being the most severe and 3 being the least severe. |
Example of a request
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # 请填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url" : "sfile.chatglm.cn/testpath/xxxx.jpg"
}
},
{
"type": "text",
"text": "图里有什么"
}
]
},
],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta)
Response Example
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"下"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"角"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"有一个"}}]}
... ...
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"树木"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"。"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":1037,"completion_tokens":37,"total_tokens":1074}}