AI Personal Learning
and practical guidance

Smart Spectrum released a free vision model GLM-4V-Flash concurrency limit raised to 10, this time enough for production environments

GLM-4V Series

The GLM-4V series contains 3 models for different application scenarios.

  • GLM-4V-Plus: With excellent multimodal comprehension capability, it can process up to 5 images simultaneously and supports video content comprehension, which is suitable for complex multimedia analysis scenarios.
  • GLM-4V: Focusing on image content understanding, it is capable of parsing up to 5 images simultaneously, and is suitable for occasions where deep image understanding is required.
  • GLM-4V-Flash (Free): Focuses on efficient single-image understanding for scenarios with fast image parsing, such as real-time image analysis or batch image processing.

synchronous call

interface request

typology clarification
transmission method https
request address https://open.bigmodel.cn/api/paas/v4/chat/completions
invocation method Synchronized call that waits for the model execution to complete and return the final result or SSE call
character encoding UTF-8
Interface Request Format JSON
response format JSON or Standard Stream Event
Interface request type POST
development language Any development language that can initiate http requests

Request Parameters

Parameter name typology mandatory field Parameter description
model String be The model encoding of the call. Model encoding: glm-4v-plus , glm-4v, glm-4v-flash (free)
messages List be When the language model is called, a list of current dialog messages is input to the model as prompts, passing parameters in the form of json arrays. For example, the video comprehension parameter:{ "role": "user", "content": [ { "type": "video_url", "video_url": { "url" : "https://xxx/xx.mp4" } }, { "type": "text", "text": "Please describe this video carefully" } ] }Picture comprehension parameters:{ "role": "user", "content": [ { "type": "image_url", "image_url": { "url" : "https://xxx/xx.jpg" } }, { "type": "text", "text": "Explain the phenomenon in the picture" } ] }Possible message types include User message, Assistant message. see the message message field descriptions below.
request_id String clogged Passed by the user, need to ensure the uniqueness; used to distinguish between each request for a unique identity, the user side does not pass the platform will be generated by default.
do_sample Boolean clogged Sampling strategy is enabled when do_sample is true, and sampling strategies temperature and top_p will not take effect when do_sample is false.
stream Boolean clogged This parameter should be set to Fasle or omitted when using synchronized calls. If set to True, the model will return the model generated content one by one through the standard Event Stream. If set to True, the model will return the model generated content in a standard Event Stream, one block at a time, and at the end of the Event Stream, it will return andata: [DONE]Message.
temperature Float clogged Sample temperature, control the randomness of the output, must be positive The range of values is:[0.0,1.0]The default value is 0.8, the larger the value, the more random and creative the output will be; the smaller the value, the more stable or certain the output will be. It is recommended to adjust the value according to the application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time
top_p Float clogged Another method of sampling with temperature, called kernel sampling The range of values is:[0.0, 1.0]The default value is 0.6 The model considers models with a top_p probability mass tokens For example, 0.1 means that the model decoder only considers tokens from the candidate set of probabilities from the former 10%. It is recommended that you adjust the model decoder according to your application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time
max_tokens Integer clogged Maximum model output tokens, default is 1024, maximum support 1024tokens.
user_id String clogged Unique ID of the end-user to assist the platform to intervene in end-user violations, generation of illegal and undesirable information or other abusive behaviors.ID length requirement: minimum 6 characters, maximum 128 characters.

Messages format

Acceptable message types for the model include User message, Assistant message, and the format varies from one message type to another. The details are as follows:

User message

Parameter name typology mandatory field Parameter description
role String be The role information for the message should beuser
content List be Message content.
type String be Text type: text Image type: image_url Video type: video_url Video and image types cannot be entered at the same time
text String be Supplemental when type is text
image_url Object be When type is image_url, add
url String be Image url or base64 encoding. Image size upload limit is less than 5M per image and not more than 6000*6000 pixels. jpg, png, jpeg formats are supported. Note: GLM-4V-Flash does not support base64 encoding.
video_url Object be When type is video_url, the video_url parameter must be the first when only glm-4v-plus supports video input video comprehension.
url String be Video url. video size is only supported up to 20M, video duration is not more than 30s. supported video type mp4 .

Assistant message

Parameter name typology mandatory field Parameter description
role String be The role information for the message should beassistant
content String be Message

response parameter

Parameter name typology Parameter description
id String Task ID
created Long The request creation time, which is a Unix timestamp in seconds.
model String Model name
choices List Model output content of the current dialog
index Integer Result subscripts
finish_reason String Modeling Reasons for Reasoning About Termination.stopRepresents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies.
message Object Text message returned by the model
role String The role of the current dialog, currently defaulting to assistant (model)
content List Content of the current dialogue
usage Object Returns a count of the number of tokens for this model call at the end of the session
prompt_tokens Integer Number of tokens entered by the user
completion_tokens Integer Number of tokens output by the model
total_tokens Integer Total tokens
content_filter List Back to information about content security
role String Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context
level Integer Severity level 0-3, with level 0 being the most severe and 3 being the least severe.

Example of a request

Upload Video URL

# video comprehension example, upload video URLs
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "video_url", "content": [ {
"video_url": {
"url" : "https://sfile.chatglm.cn/testpath/video/xxxxx.mp4"
}
},
{
"type": "text".
"text": "Please describe this video carefully"
}
]
}
]
)
print(response.choices[0].message)

Upload video Base64

import base64
from zhipuai import ZhipuAI
video_path = "/Users/YourCompluter/xxxx.mp4"
with open(video_path, 'rb') as video_file.
video_base = base64.b64encode(video_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # complete your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "video_url", "content": [ {
"video_url": {
"url" : video_base
}
},
{
"type": "text", "text": "Please describe this video carefully", {
"text": "Please describe this video carefully"
}
]
}
]
)
print(response.choices[0].message)

Upload Image URL

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v", # Fill in the name of the model you want to call.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "text", "text": "What's in the picture", "text": "What's in the picture
"text": "What's in the picture"
},
{
"type": "image_url", {
"image_url": {
"url" : "https://img1.baidu.com/it/u=1369931113,3388870256&fm=253&app=138&size=w931&n=0&f=JPEG&fmt=auto?sec=1703696400&t= f3028c7a1dca43a080aeb8239f09cc2f"
}
}
]
}
]
)
print(response.choices[0].message)

Upload image Base64

import base64
from zhipuai import ZhipuAI
img_path = "/Users/YourCompluter/xxxx.jpeg"
with open(img_path, 'rb') as img_file: img_base = base64.
img_base = base64.b64encode(img_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"role": "user", "content": [ {
"content": [
{
"type": "image_url",
"image_url": {
"url": img_base
}
},
{
"type": "text",
"text": "Please describe this image"
}
]
}
]
)
print(response.choices[0].message)

Multiple rounds of picture dialog

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # Fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model to be called.
messages=[
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxx.png"
},
"type": "image_url"
},
{
"text": "What's in the picture",
"type": "text"
}
], { "role": "user".
"role": "user"
},
{
"content": [
{
"text": "This is a painting depicting a natural landscape, showing a peaceful lake with water in varying shades of blue-green hues. A blooming cherry blossom tree grows near the lake, its pinkish-white blossoms looking vibrant in the sunlight. Under the tree there is a man dressed in blue sitting on a boat, seemingly enjoying the good time. In the \n\n background are rolling hills covered in green vegetation, while the tops of the hills are surrounded by clouds, adding a sense of mystery. A small pavilion can also be seen in the distance, which sits on a rocky outcrop near the lake, harmonizing with the surrounding natural environment. \n\nThe whole picture gives a feeling of tranquility and harmony, as if one is in a paradise." ,
"type": "text"
}
],
"role": "assistant"
}, { "role": "assistant".
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "How is this image different from the above image?", {
"type": "text"
}
], { "role": "user".
"role": "user"
}, { "role": "user".
{
"content": [
{
"text": "The first picture and the second picture are very different in terms of picture content, composition and color, forming a sharp contrast. \n\n1.The first picture is a landscape painting, mainly depicting natural scenery such as cherry blossom trees, lakes, mountains, etc., which is rich in color and full of spring, while the second picture is a landscape photo depicting the Great Wall in winter, with the Great Wall meandering through the snow-covered mountains, and the overall color tone is cold, giving a feeling of coldness and serenity. \n\. Compositionally, the first picture adopts a symmetrical composition, with the cherry blossom tree as the center and the scenery on both sides echoing each other to form a balanced and harmonious picture effect, while the second picture adopts a diagonal composition, with the Great Wall serving as a visual guide line to lead the viewer's eyes across the whole picture, increasing the depth and dynamic sense of the picture. \n\nOverall, these two pictures show a clear contrast in picture content, color, composition and atmosphere, one is a spring landscape full of vitality and color, the other is a serene and mysterious winter landscape photo of the Great Wall." ,
"type": "text"
}
],.
"role": "assistant"
}, { "role": "assistant".
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "What is the difference between this image and the previous one?", {
"type": "text"
}
], { "role": "user".
"role": "user"
}
]
)
print(response.choices[0].message)

Response Example

{
"created": 1703487403,.
"id": "8239375684858666781",
"model": "glm-4v-plus",
"request_id": "8239375684858666781",
"choices": [
{
"finish_reason": "stop",

"message": {
"content": "The picture has a blue sea and blue sky with white clouds in the sky. In the bottom right corner of the picture there is an island or rock with dark green trees growing on it." ,
"role": "assistant"
}
}
],.
"usage": {

"prompt_tokens": 1037,
"total_tokens": 1074
}
}

streaming output

response parameter

Parameter name typology Parameter description
id String Task ID
created Long The request creation time, which is a Unix timestamp in seconds.
choices List Model output content of the current dialog
index Integer Result subscripts
finish_reason String Modeling Reasons for Reasoning About Termination.stopRepresents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies.
delta Object Text message returned by the model increment
role String The role of the current dialog, currently defaulting to assistant (model)
content String Content of the current dialogue
usage Object Statistics on the number of tokens called by this model
prompt_tokens Integer Number of tokens entered by the user
completion_tokens Integer Number of tokens output by the model
total_tokens Integer Total tokens
content_filter List Back to information about content security
role String Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context
level Integer Severity level 0-3, with level 0 being the most severe and 3 being the least severe.

Example of a request

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # Please fill in your own APIKey
response = client.chat.completions.create(
model="glm-4v-plus", # Fill in the name of the model you want to call.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url" : "sfile.chatglm.co.uk/testpath/xxxx.jpg"
}
},
{
"type": "text", {
"text": "What's in the picture"
}
]
}
],
stream=True, )
)
for chunk in response.
print(chunk.choices[0].delta)

Response Example

data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant", " content": "next"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content": "corner"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content": "There is one"}}]}
... ...
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant", " content": "trees"}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "delta":{"role": "assistant"," content":"."}}]}
data: {"id": "8305986882425703351", "created":1705476637, "model": "glm-4v-plus", "choices":[{"index":0, "finish_reason": "stop", "delta":{ "role": "assistant", "content":""}}], "usage":{"prompt_tokens":1037, "completion_tokens":37, "total_tokens":1074}}
May not be reproduced without permission:Chief AI Sharing Circle " Smart Spectrum released a free vision model GLM-4V-Flash concurrency limit raised to 10, this time enough for production environments

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish