GLM-4V Series
The GLM-4V series contains 3 models for different application scenarios.
- GLM-4V-Plus: With excellent multimodal comprehension capability, it can process up to 5 images simultaneously and supports video content comprehension, which is suitable for complex multimedia analysis scenarios.
- GLM-4V: Focusing on image content understanding, it is capable of parsing up to 5 images simultaneously, and is suitable for occasions where deep image understanding is required.
- GLM-4V-Flash (Free): Focuses on efficient single-image understanding for scenarios with fast image parsing, such as real-time image analysis or batch image processing.
synchronous call
interface request
typology |
clarification |
transmission method |
https |
request address |
https://open.bigmodel.cn/api/paas/v4/chat/completions |
invocation method |
Synchronized call that waits for the model execution to complete and return the final result or SSE call |
character encoding |
UTF-8 |
Interface Request Format |
JSON |
response format |
JSON or Standard Stream Event |
Interface request type |
POST |
development language |
Any development language that can initiate http requests |
Request Parameters
Parameter name |
typology |
mandatory field |
Parameter description |
model |
String |
be |
The model encoding of the call. Model encoding: glm-4v-plus , glm-4v, glm-4v-flash (free) |
messages |
List |
be |
When the language model is called, a list of current dialog messages is input to the model as prompts, passing parameters in the form of json arrays. For example, the video comprehension parameter:{ "role": "user", "content": [ { "type": "video_url", "video_url": { "url" : "https://xxx/xx.mp4" } }, { "type": "text", "text": "Please describe this video carefully" } ] } Picture comprehension parameters:{ "role": "user", "content": [ { "type": "image_url", "image_url": { "url" : "https://xxx/xx.jpg" } }, { "type": "text", "text": "Explain the phenomenon in the picture" } ] } Possible message types include User message, Assistant message. see the message message field descriptions below. |
request_id |
String |
clogged |
Passed by the user, need to ensure the uniqueness; used to distinguish between each request for a unique identity, the user side does not pass the platform will be generated by default. |
do_sample |
Boolean |
clogged |
Sampling strategy is enabled when do_sample is true, and sampling strategies temperature and top_p will not take effect when do_sample is false. |
stream |
Boolean |
clogged |
This parameter should be set to Fasle or omitted when using synchronized calls. If set to True, the model will return the model generated content one by one through the standard Event Stream. If set to True, the model will return the model generated content in a standard Event Stream, one block at a time, and at the end of the Event Stream, it will return andata: [DONE] Message. |
temperature |
Float |
clogged |
Sample temperature, control the randomness of the output, must be positive The range of values is:[0.0,1.0] The default value is 0.8, the larger the value, the more random and creative the output will be; the smaller the value, the more stable or certain the output will be. It is recommended to adjust the value according to the application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time |
top_p |
Float |
clogged |
Another method of sampling with temperature, called kernel sampling The range of values is:[0.0, 1.0] The default value is 0.6 The model considers models with a top_p probability mass tokens For example, 0.1 means that the model decoder only considers tokens from the candidate set of probabilities from the former 10%. It is recommended that you adjust the model decoder according to your application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time |
max_tokens |
Integer |
clogged |
Maximum model output tokens, default is 1024, maximum support 1024tokens. |
user_id |
String |
clogged |
Unique ID of the end-user to assist the platform to intervene in end-user violations, generation of illegal and undesirable information or other abusive behaviors.ID length requirement: minimum 6 characters, maximum 128 characters. |
Messages format
Acceptable message types for the model include User message, Assistant message, and the format varies from one message type to another. The details are as follows:
User message
Parameter name |
typology |
mandatory field |
Parameter description |
role |
String |
be |
The role information for the message should beuser |
content |
List |
be |
Message content. |
type |
String |
be |
Text type: text Image type: image_url Video type: video_url Video and image types cannot be entered at the same time |
text |
String |
be |
Supplemental when type is text |
image_url |
Object |
be |
When type is image_url, add |
url |
String |
be |
Image url or base64 encoding. Image size upload limit is less than 5M per image and not more than 6000*6000 pixels. jpg, png, jpeg formats are supported. Note: GLM-4V-Flash does not support base64 encoding. |
video_url |
Object |
be |
When type is video_url, the video_url parameter must be the first when only glm-4v-plus supports video input video comprehension. |
url |
String |
be |
Video url. video size is only supported up to 20M, video duration is not more than 30s. supported video type mp4 . |
Assistant message
Parameter name |
typology |
mandatory field |
Parameter description |
role |
String |
be |
The role information for the message should beassistant |
content |
String |
be |
Message |
response parameter
Parameter name |
typology |
Parameter description |
id |
String |
Task ID |
created |
Long |
The request creation time, which is a Unix timestamp in seconds. |
model |
String |
Model name |
choices |
List |
Model output content of the current dialog |
index |
Integer |
Result subscripts |
finish_reason |
String |
Modeling Reasons for Reasoning About Termination.stop Represents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies. |
message |
Object |
Text message returned by the model |
role |
String |
The role of the current dialog, currently defaulting to assistant (model) |
content |
List |
Content of the current dialogue |
usage |
Object |
Returns a count of the number of tokens for this model call at the end of the session |
prompt_tokens |
Integer |
Number of tokens entered by the user |
completion_tokens |
Integer |
Number of tokens output by the model |
total_tokens |
Integer |
Total tokens |
content_filter |
List |
Back to information about content security |
role |
String |
Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context |
level |
Integer |
Severity level 0-3, with level 0 being the most severe and 3 being the least severe. |
Example of a request
Upload Video URL
# video comprehension example, upload video URLs
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "video_url", "content": [ {
"video_url": {
"url" : "https://sfile.chatglm.cn/testpath/video/xxxxx.mp4"
}
},
{
"type": "text".
"text": "Please describe this video carefully"
}
]
}
]
)
print(response.choices[0].message)
Upload video Base64
import base64
from zhipuai import ZhipuAI
video_path = "/Users/YourCompluter/xxxx.mp4"
with open(video_path, 'rb') as video_file.
video_base = base64.b64encode(video_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # complete your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "video_url", "content": [ {
"video_url": {
"url" : video_base
}
},
{
"type": "text", "text": "Please describe this video carefully", {
"text": "Please describe this video carefully"
}
]
}
]
)
print(response.choices[0].message)
Upload Image URL
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v", # Fill in the name of the model you want to call.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "text", "text": "What's in the picture", "text": "What's in the picture
"text": "What's in the picture"
},
{
"type": "image_url", {
"image_url": {
"url" : "https://img1.baidu.com/it/u=1369931113,3388870256&fm=253&app=138&size=w931&n=0&f=JPEG&fmt=auto?sec=1703696400&t= f3028c7a1dca43a080aeb8239f09cc2f"
}
}
]
}
]
)
print(response.choices[0].message)
Upload image Base64
import base64
from zhipuai import ZhipuAI
img_path = "/Users/YourCompluter/xxxx.jpeg"
with open(img_path, 'rb') as img_file: img_base = base64.
img_base = base64.b64encode(img_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "image_url",
"image_url": {
"url": img_base
}
},
{
"type": "text",
"text": "Please describe this image"
}
]
}
]
)
print(response.choices[0].message)
Multiple rounds of picture dialog
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxx.png"
},
"type": "image_url"
},
{
"text": "What's in the picture",
"type": "text"
}
], { "role": "user".
"role": "user"
},
{
"content": [
{
"text": "This is a painting depicting a natural landscape, showing a peaceful lake with water in varying shades of blue-green hues. A blooming cherry blossom tree grows near the lake, its pinkish-white blossoms looking vibrant in the sunlight. Under the tree there is a man dressed in blue sitting on a boat, seemingly enjoying the good time. In the \n\n background are rolling hills covered in green vegetation, while the tops of the hills are surrounded by clouds, adding a sense of mystery. A small pavilion can also be seen in the distance, which sits on a rocky outcrop near the lake, harmonizing with the surrounding natural environment. \n\nThe whole picture gives a feeling of tranquility and harmony, as if one is in a paradise." ,
"type": "text"
}
],
"role": "assistant"
}, { "role": "assistant".
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "How is this image different from the above image?", {
"type": "text"
}
], { "role": "user".
"role": "user"
}, { "role": "user".
{
"content": [
{
"text": "The first picture and the second picture are very different in terms of picture content, composition and color, forming a sharp contrast. \n\n1.The first picture is a landscape painting, mainly depicting natural scenery such as cherry blossom trees, lakes, mountains, etc., which is rich in color and full of spring, while the second picture is a landscape photo depicting the Great Wall in winter, with the Great Wall meandering through the snow-covered mountains, and the overall color tone is cold, giving a feeling of coldness and serenity. \n\. Compositionally, the first picture adopts a symmetrical composition, with the cherry blossom tree as the center and the scenery on both sides echoing each other to form a balanced and harmonious picture effect, while the second picture adopts a diagonal composition, with the Great Wall serving as a visual guide line to lead the viewer's eyes across the whole picture, increasing the depth and dynamic sense of the picture. \n\nOverall, these two pictures show a clear contrast in picture content, color, composition and atmosphere, one is a spring landscape full of vitality and color, the other is a serene and mysterious winter landscape photo of the Great Wall." ,
"type": "text"
}
],.
"role": "assistant"
}, { "role": "assistant".
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "What is the difference between this image and the previous one?", {
"type": "text"
}
], { "role": "user".
"role": "user"
}
]
)
print(response.choices[0].message)
Response Example
{
"created": 1703487403,.
"id": "8239375684858666781",
"model": "glm-4v-plus",
"request_id": "8239375684858666781",
"choices": [
{
"finish_reason": "stop",
"message": {
"content": "The picture has a blue sea and blue sky with white clouds in the sky. In the bottom right corner of the picture there is an island or rock with dark green trees growing on it." ,
"role": "assistant"
}
}
],.
"usage": {
"prompt_tokens": 1037,
"total_tokens": 1074
}
}
streaming output
response parameter
Parameter name |
typology |
Parameter description |
id |
String |
Task ID |
created |
Long |
The request creation time, which is a Unix timestamp in seconds. |
choices |
List |
Model output content of the current dialog |
index |
Integer |
Result subscripts |
finish_reason |
String |
Modeling Reasons for Reasoning About Termination.stop Represents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies. |
delta |
Object |
Text message returned by the model increment |
role |
String |
The role of the current dialog, currently defaulting to assistant (model) |
content |
String |
Content of the current dialogue |
usage |
Object |
Statistics on the number of tokens called by this model |
prompt_tokens |
Integer |
Number of tokens entered by the user |
completion_tokens |
Integer |
Number of tokens output by the model |
total_tokens |
Integer |
Total tokens |
content_filter |
List |
Back to information about content security |
role |
String |
Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context |
level |
Integer |
Severity level 0-3, with level 0 being the most severe and 3 being the least severe. |
Example of a request
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # Please fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model you want to call.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url" : "sfile.chatglm.co.uk/testpath/xxxx.jpg"
}
},
{
"type": "text", {
"text": "What's in the picture"
}
]
}
],
stream=True, )
)
for chunk in response.
print(chunk.choices[0].delta)
Response Example
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant", " content": "next"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content": "corner"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content": "There is one"}}]}
... ...
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant", " content": "trees"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content":"."}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "finish_reason": "stop", "delta":{ "role": "assistant", "content":""}}], "usage":{"prompt_tokens":1037, "completion_tokens":37, "total_tokens":1074}}