AI Personal Learning
and practical guidance
讯飞绘镜

Smart Spectrum released a free vision model GLM-4V-Flash concurrency limit raised to 10, this time enough for production environments

GLM-4V Series

The GLM-4V series contains 3 models for different application scenarios.

  • GLM-4V-Plus: With excellent multimodal comprehension capability, it can process up to 5 images simultaneously and supports video content comprehension, which is suitable for complex multimedia analysis scenarios.
  • GLM-4V: Focusing on image content understanding, it is capable of parsing up to 5 images simultaneously, and is suitable for occasions where deep image understanding is required.
  • GLM-4V-Flash (Free): Focuses on efficient single-image understanding for scenarios with fast image parsing, such as real-time image analysis or batch image processing.

synchronous call

interface request

typology clarification
transmission method https
request address https://open.bigmodel.cn/api/paas/v4/chat/completions
invocation method Synchronized call that waits for the model execution to complete and return the final result or SSE call
character encoding UTF-8
Interface Request Format JSON
response format JSON or Standard Stream Event
Interface request type POST
development language Any development language that can initiate http requests

Request Parameters

Parameter name typology mandatory field Parameter description
model String be The model encoding of the call. Model encoding: glm-4v-plus , glm-4v, glm-4v-flash (free)
messages List be When the language model is called, a list of current dialog messages is input to the model as prompts, passing parameters in the form of json arrays. For example, the video comprehension parameter:{ "role": "user", "content": [ { "type": "video_url", "video_url": { "url" : "https://xxx/xx.mp4" } }, { "type": "text", "text": "请仔细描述这个视频" } ] }Picture comprehension parameters:{ "role": "user", "content": [ { "type": "image_url", "image_url": { "url" : "https://xxx/xx.jpg" } }, { "type": "text", "text": "解释一下图中的现象" } ] }Possible message types include User message, Assistant message. see the message message field descriptions below.
request_id String clogged Passed by the user, need to ensure the uniqueness; used to distinguish between each request for a unique identity, the user side does not pass the platform will be generated by default.
do_sample Boolean clogged Sampling strategy is enabled when do_sample is true, and sampling strategies temperature and top_p will not take effect when do_sample is false.
stream Boolean clogged This parameter should be set to Fasle or omitted when using synchronized calls. If set to True, the model will return the model generated content one by one through the standard Event Stream. If set to True, the model will return the model generated content in a standard Event Stream, one block at a time, and at the end of the Event Stream, it will return andata: [DONE]Message.
temperature Float clogged Sample temperature, control the randomness of the output, must be positive The range of values is:[0.0,1.0]The default value is 0.8, the larger the value, the more random and creative the output will be; the smaller the value, the more stable or certain the output will be. It is recommended to adjust the value according to the application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time
top_p Float clogged Another method of sampling with temperature, called kernel sampling The range of values is:[0.0, 1.0]The default value is 0.6 The model considers models with a top_p probability mass tokens For example, 0.1 means that the model decoder only considers tokens from the candidate set of probabilities from the former 10%. It is recommended that you adjust the model decoder according to your application scenario. top_p maybe temperature parameter, but do not adjust both parameters at the same time
max_tokens Integer clogged Maximum model output tokens, default is 1024, maximum support 1024tokens.
user_id String clogged Unique ID of the end-user to assist the platform to intervene in end-user violations, generation of illegal and undesirable information or other abusive behaviors.ID length requirement: minimum 6 characters, maximum 128 characters.

Messages format

Acceptable message types for the model include User message, Assistant message, and the format varies from one message type to another. The details are as follows:

User message

Parameter name typology mandatory field Parameter description
role String be The role information for the message should beuser
content List be Message content.
type String be Text type: text Image type: image_url Video type: video_url Video and image types cannot be entered at the same time
text String be Supplemental when type is text
image_url Object be When type is image_url, add
url String be Image url or base64 encoding. Image size upload limit is less than 5M per image and not more than 6000*6000 pixels. jpg, png, jpeg formats are supported. Note: GLM-4V-Flash does not support base64 encoding.
video_url Object be When type is video_url, the video_url parameter must be the first when only glm-4v-plus supports video input video comprehension.
url String be Video url. video size is only supported up to 20M, video duration is not more than 30s. supported video type mp4 .

Assistant message

Parameter name typology mandatory field Parameter description
role String be The role information for the message should beassistant
content String be Message

response parameter

Parameter name typology Parameter description
id String Task ID
created Long The request creation time, which is a Unix timestamp in seconds.
model String Model name
choices List Model output content of the current dialog
index Integer Result subscripts
finish_reason String Modeling Reasons for Reasoning About Termination.stopRepresents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies.
message Object Text message returned by the model
role String The role of the current dialog, currently defaulting to assistant (model)
content List Content of the current dialogue
usage Object Returns a count of the number of tokens for this model call at the end of the session
prompt_tokens Integer Number of tokens entered by the user
completion_tokens Integer Number of tokens output by the model
total_tokens Integer Total tokens
content_filter List Back to information about content security
role String Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context
level Integer Severity level 0-3, with level 0 being the most severe and 3 being the least severe.

Example of a request

Upload Video URL

#视频理解示例、上传视频URL
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus",  # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url" : "https://sfile.chatglm.cn/testpath/video/xxxxx.mp4"
}
},
{
"type": "text",
"text": "请仔细描述这个视频"
}
]
}
]
)
print(response.choices[0].message)

Upload video Base64

import base64
from zhipuai import ZhipuAI
video_path = "/Users/YourCompluter/xxxx.mp4"
with open(video_path, 'rb') as video_file:
video_base = base64.b64encode(video_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus",  # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url" : video_base
}
},
{
"type": "text",
"text": "请仔细描述这个视频"
}
]
}
]
)
print(response.choices[0].message)

Upload Image URL

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v",  # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "图里有什么"
},
{
"type": "image_url",
"image_url": {
"url" : "https://img1.baidu.com/it/u=1369931113,3388870256&fm=253&app=138&size=w931&n=0&f=JPEG&fmt=auto?sec=1703696400&t=f3028c7a1dca43a080aeb8239f09cc2f"
}
}
]
}
]
)
print(response.choices[0].message)

Upload image Base64

import base64
from zhipuai import ZhipuAI
img_path = "/Users/YourCompluter/xxxx.jpeg"
with open(img_path, 'rb') as img_file:
img_base = base64.b64encode(img_file.read()).decode('utf-8')
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus",  # 填写需要调用的模型名称
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": img_base
}
},
{
"type": "text",
"text": "请描述这个图片"
}
]
}
]
)
print(response.choices[0].message)

Multiple rounds of picture dialog

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR API KEY") # 填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus",  # 填写需要调用的模型名称
messages=[
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxx.png"
},
"type": "image_url"
},
{
"text": "图中有什么",
"type": "text"
}
],
"role": "user"
},
{
"content": [
{
"text": "这是一幅描绘自然风景的画,展现了一片宁静的湖泊,湖水呈现出深浅不一的蓝绿色调。湖边长着一棵盛开的樱花树,粉白色的花朵在阳光下显得格外鲜艳。树下有一位身穿蓝色衣服的人坐在船上,似乎正在享受这美好的时光。\n\n背景是连绵起伏的山脉,山体被绿色的植被覆盖,山顶则被云雾缭绕,增添了几分神秘感。远处还可以看到一座小亭子,它坐落在湖边的岩石上,与周围的自然环境和谐相融。\n\n整个画面给人一种宁静、和谐的感觉,仿佛置身于一个世外桃源之中。",
"type": "text"
}
],
"role": "assistant"
},
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "这个图与上面图有什么不一样",
"type": "text"
}
],
"role": "user"
},
{
"content": [
{
"text": "第一张图和第二张图在画面内容、构图、色彩等方面都有很大的不同,形成鲜明的对比。\n\n1.第一张图是风景画,主要描绘了自然风景,如樱花树、湖泊、山脉等,色彩丰富,充满春天的气息,而第二张图是描绘长城在冬季的风景照片,长城在白雪覆盖的山脉中蜿蜒盘旋,整体色调偏冷,给人一种寒冷、宁静的感觉。\n\n2.构图上,第一张图采用对称式构图,以樱花树为中心,两侧的景物相互呼应,形成平衡和谐的画面效果,第二张图则采用对角线构图,长城作为视觉引导线,引导观众的视线穿越整个画面,增加画面的深度和动态感。\n\n整体来看,这两张图在画面内容、色彩、构图和氛围上都呈现出明显的对比,一张是充满生机和色彩的春日风景画,另一张是宁静而神秘的冬日长城风景照。",
"type": "text"
}
],
"role": "assistant"
},
{
"content": [
{
"image_url": {
"url": "https://cdn.bigmodel.cn/xxxxxxx.jpeg"
},
"type": "image_url"
},
{
"text": "这个图与上一张图有什么区别",
"type": "text"
}
],
"role": "user"
}
]
)
print(response.choices[0].message)

Response Example

{
"created": 1703487403,
"id": "8239375684858666781",
"model": "glm-4v-plus",
"request_id": "8239375684858666781",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "图中有一片蓝色的海和蓝天,天空中有白色的云朵。图片的右下角有一个小岛或者岩石,上面长着深绿色的树木。",
"role": "assistant"
}
}
],
"usage": {
"completion_tokens": 37,
"prompt_tokens": 1037,
"total_tokens": 1074
}
}

streaming output

response parameter

Parameter name typology Parameter description
id String Task ID
created Long The request creation time, which is a Unix timestamp in seconds.
choices List Model output content of the current dialog
index Integer Result subscripts
finish_reason String Modeling Reasons for Reasoning About Termination.stopRepresents the natural end of reasoning or triggers a stop word.length represents reaching the upper limit of tokens length.sensitive Represents model inference content intercepted by the security audit interface.network_error represents model inference anomalies.
delta Object Text message returned by the model increment
role String The role of the current dialog, currently defaulting to assistant (model)
content String Content of the current dialogue
usage Object Statistics on the number of tokens called by this model
prompt_tokens Integer Number of tokens entered by the user
completion_tokens Integer Number of tokens output by the model
total_tokens Integer Total tokens
content_filter List Back to information about content security
role String Security entry into force sessions, includingrole = assistant model reasoning.role = user user input.role = history historical context
level Integer Severity level 0-3, with level 0 being the most severe and 3 being the least severe.

Example of a request

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="") # 请填写您自己的APIKey
response = client.chat.completions.create(
model="glm-4v-plus",  # 填写需要调用的模型名称
messages=[
{
"role": "user", 
"content": [
{
"type": "image_url",
"image_url": {
"url" : "sfile.chatglm.cn/testpath/xxxx.jpg"
}
},
{
"type": "text",
"text": "图里有什么"
}
]
},
],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta)

Response Example

data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"下"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"角"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"有一个"}}]}
... ...
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"树木"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"delta":{"role":"assistant","content":"。"}}]}
data: {"id":"8305986882425703351","created":1705476637,"model":"glm-4v-plus","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":1037,"completion_tokens":37,"total_tokens":1074}}
May not be reproduced without permission:Chief AI Sharing Circle " Smart Spectrum released a free vision model GLM-4V-Flash concurrency limit raised to 10, this time enough for production environments
en_USEnglish