summary
Ollama A powerful REST API is provided to enable developers to easily interact with large language models. Through the Ollama API, users can send requests and receive responses generated by the model, which are applied to tasks such as natural language processing and text generation. In this paper, we will introduce the basic operations of generating complements and dialog generation in detail, and common operations such as creating models, copying models, and deleting models are also explained.
starting point or ending point (in stories etc)
Answer Completion \ Dialog Completion \ Create Model \ Copy Model \ Delete Model \ List Running Models \ List Local Models \ Show Model Information \ Pull Models \ Push Models \ Generate Embedding
I. Answer Completion
POST /api/generate
Generates a response to a given prompt using the specified model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and other data from the request.
parameters
model
: (Required) Model nameprompt
: Tip to generate a responsesuffix
: Text after model responseimages
: (Optional) a list of base64-encoded images (for multimodal models such asllava
)
Advanced Parameters (optional):
format
: Returns the format of the response. The only values currently accepted arejson
options
:: Other model parameters, such astemperature
,seed
et al. (and other authors)system
: System Messagestemplate
: The prompt template to usecontext
:: From the previous review of/generate
The contextual parameters returned in the request can be used to keep a short dialog memorystream
: If set tofalse
The response will be returned as a single response object rather than a stream of objects.raw
: If set totrue
, there will be no formatting of the prompt. If you specify a full template prompt when requesting the API, you can optionally use theraw
parameterskeep_alive
: Controls how long the model remains in memory after a request (default:5m
)
Example request (streaming)
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?"
}'
Tip.
If you use the curl
command, please download curl for WindowsTo add environment variables, extract the file, then locate the bin subfile in the directory where the file is located, and copy the address of the file to add the environment variables.
Use the following command in a command line window (not PowerShell, mind you) to check if it was successfully added.
curl --help
The following is displayed to indicate successful addition.
Tip.
In a Windows command line window, use the curl
When requesting commands, note the use of escaped double quotes. Example commands are listed below.
curl http://localhost:11434/api/generate -d "{\"model\": \"llama3.1\", \"prompt\": \"为什么草是绿的\"}"
The following display indicates that the request was successful.
Example Response
The return is a stream of JSON objects:
{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:08.184732629Z",
"response":"植物",
"done":false
}
The final response in the stream also includes additional data about the generation:
- context: the dialog code used for this response, which can be sent in the next request to keep the dialog memorized
- total_duration: time spent generating the response (in nanoseconds)
- load_duration: time taken to load the model (in nanoseconds)
- prompt_eval_count: number of tokens in the prompt
- prompt_eval_duration: time taken to evaluate the prompt (in nanoseconds)
- eval_count: number of tokens in the response
- eval_duration: time taken to generate the response (in nanoseconds)
- response: null if the response is streamed, if not, this will contain the full response To calculate the response generation rate (number of tokens generated per second, token/s), i.e.
eval_count
/eval_duration
* 10^9.
Final Response:
{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:10.819603411Z",
"response":"",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":8655401792,
"load_duration":5924129727,
"prompt_eval_count":17,
"prompt_eval_duration":29196000,
"eval_count":118,
"eval_duration":2656329000
}
Advanced Play
non-streaming output
commander-in-chief (military) stream
set to false
that can receive all responses at once.
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?",
"stream": false
}'
Example Response
{
"model":"llama3.1",
"created_at":"2024-08-08T07:13:34.418567351Z",
"response":"答案:叶子含有大量的叶绿素。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2902435095,
"load_duration":2605831520,
"prompt_eval_count":17,
"prompt_eval_duration":29322000,
"eval_count":13,
"eval_duration":266499000
}
JSON mode
(coll.) fail (a student) format
set to json
The output will be in JSON format. Note, however, that the prompt
The model is instructed to respond in JSON format, otherwise the model may generate a lot of spaces.
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?以JSON格式输出答案",
"format": "json",
"stream": false
}'
Example Response
{
"model":"llama3.1",
"created_at":"2024-08-08T07:21:24.950883454Z",
"response":"{\n \"颜色原因\": \"叶子中含有光合作用所需的叶绿素\",\n \"作用\": \"进行光合作用吸收太阳能\"\n}",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":3492279981,
"load_duration":2610591203,
"prompt_eval_count":22,
"prompt_eval_duration":28804000,
"eval_count":40,
"eval_duration":851206000
}
response
will be a string containing JSON similar to the following:
{
"颜色原因": "叶子中含有光合作用所需的叶绿素",
"作用": "进行光合作用吸收太阳能"
}
Input contains images
To add a new model to a multimodal model (e.g. llava
maybe bakllava
To submit an image, please provide a base64-encoded version of the images
List:
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"描述这张图片",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"
]
}'
Example Response
{
"model":"llava",
"created_at":"2024-08-08T07:33:55.481713465Z",
"response":" The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face. It appears to be in motion, indicated by the lines extending from its arms and tail, giving it a dynamic feel as if it is waving or dancing. The style of the image is playful and simplistic, typical of line art or stickers. The character's design has been stylized with exaggerated features such as large ears and a smiling expression, which adds to its charm. ",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2960501550,
"load_duration":4566012,
"prompt_eval_count":1,
"prompt_eval_duration":758437000,
"eval_count":108,
"eval_duration":2148818000
}
Reproducible output
commander-in-chief (military) seed
Set to a fixed value to get reproducible output:
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的?",
"stream": false,
"options": {
"seed": 1001
}
}'
Example Response
{
"model":"llama3.1",
"created_at":"2024-08-08T07:42:28.397780058Z",
"response":"答案:因为叶子中含有大量的氯离子。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":404791556,
"load_duration":18317351,
"prompt_eval_count":17,
"prompt_eval_duration":22453000,
"eval_count":16,
"eval_duration":321267000}
II. Dialogue Completion
POST /api/chat
Generate the next message in the chat using the specified model. This is also a streaming endpoint, so there will be a series of responses. If the "stream"
set to false
, then streaming can be disabled. The final response object will include the requested statistics and additional data.
parameters
model
: (Required) Model namemessages
: Chat messages, which can be used to keep a memory of the chattools
: The model supports the use of tools. It is necessary to integrate thestream
set tofalse
message
The object has the following fields:
role
: The role of the message, which can besystem
,user
,assistant
maybetool
content
: The content of the messageimages
(optional): a list of images to be included in the message (for messages such asllava
(Multimodal models such as these)tool_calls
(optional): list of tools the model wants to use
Advanced Parameters (optional):
format
: Returns the format of the response. The only currently accepted values arejson
options
: Other model parameters such astemperature
,seed
et al. (and other authors)stream
: If the value forfalse
The response will be returned as a single response object rather than a stream of objects.keep_alive
: Controls how long the model remains loaded in memory after a request (default:5m
)
Example request (streaming)
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿的?"
}
]
}'
Example Response
Returns a stream of JSON objects:
{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:36.933701041Z",
"message":{
"role":"assistant",
"content":"因为"
},
"done":false
}
Final Response:
{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:37.187621765Z",
"message":{
"role":"assistant",
"content":""
},
"done_reason":"stop",
"done":true,
"total_duration":5730533217,
"load_duration":5370535786,
"prompt_eval_count":17,
"prompt_eval_duration":29621000,
"eval_count":13,
"eval_duration":273810000
}
Advanced Play
Parameterization of non-streaming output, JSON mode, multimodal input, reproducible output and 回答API
of consistency.
With history
Send chat messages with conversation history. Multiple rounds of conversations or thought chain prompts can be started using the same method.
Example Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿色的?"
},
{
"role": "assistant",
"content": "因为草里面含有叶绿素。"
},
{
"role": "user",
"content": "为什么叶绿素让草看起来是绿色的?"
}
],
"stream": false
}'
Example Response
{
"model":"llama3.1",
"created_at":"2024-08-08T07:53:28.849517802Z",
"message":{
"role":"assistant",
"content":"这是一个更复杂的问题!\n\n叶绿素是一种称为黄素的色素,这些色素可以吸收光能。在日光下,绿色草叶中的叶绿素会吸收蓝光和红光,但反射出黄色和绿色的光,所以我们看到草看起来是绿色的。\n\n简单来说,叶绿素让草看起来是绿色的,因为它反射了我们的眼睛可以看到的绿光,而不反射我们看到的其他颜色。"
},
"done_reason":"stop",
"done":true,
"total_duration":5065572138,
"load_duration":2613559070,
"prompt_eval_count":48,
"prompt_eval_duration":37825000,
"eval_count":106,
"eval_duration":2266694000}
III. Creating models
POST /api/create
recommended general modelfile
set to the contents of the Modelfile, rather than just setting the path
Remote Model Creation. Remote model creation must also use Create Blob to explicitly create all file blobs, fields (such as the FROM
cap (a poem) ADAPTER
) and set the value to the path indicated in the response.
parameters
name
: Name of the model to be createdmodelfile
(optional): contents of the Modelfilestream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.path
(optional): path to the Modelfile
Example Request
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'
Example Response
A string of JSON objects. Notice that the final JSON object shows "status": "success"
Prompted to create successfully.
{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}
Check if the Blob exists
HEAD /api/blobs/:digest
Make sure that the file blob for the FROM or ADAPTER field exists on the server. This is checking your Ollama server and not Ollama.ai.
Query parameters
digest
: SHA256 digest of blob
Example Request
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Example Response
Returns "200 OK" if the blob exists, or "404 Not Found" if it does not.
Creating a Blob
POST /api/blobs/:digest
Creates a blob from a file on the server. returns the server file path.
Query parameters
digest
: Expected SHA256 summary of the document
Example Request
curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Example Response
Returns 201 Created if the blob was successfully created, or 400 Bad Request if the digest used was not as expected.
IV. Replication model
POST /api/copy
Duplicate a model to duplicate an existing model using another name.
Example Request
curl http://localhost:11434/api/copy -d '{
"source": "llama3.1",
"destination": "llama3-backup"
}'
Example Response
Returns "200 OK" if successful, or "404 Not Found" if the source model does not exist.
V. Deletion of models
DELETE /api/delete
Delete the model and its data.
parameters
name
: Name of the model to be deleted
Example Request
[](https://github.com/datawhalechina/handy-ollama/blob/main/docs/C4/1.1 TP3T20OllamaAPI. md#-4)curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama3.1"
}'
Example Response
Returns "200 OK" if successful or "404 Not Found" if the model to be deleted does not exist.
VI. Listing of operational models
GET /api/ps
Lists the models currently loaded into memory.
Example Request
curl http://localhost:11434/api/ps
Example Response
{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"size":6654289920,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"expires_at":"2024-08-08T14:06:52.883023476+08:00",
"size_vram":6654289920
}
]
}
VII. Listing of local models
GET /api/tags
Lists locally available models.
Example Request
curl http://localhost:11434/api/tags
Example Response
{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"modified_at":"2024-08-07T17:54:22.533937636+08:00",
"size":4661230977,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
}
}
]
}
VIII. Display of model information
POST /api/show
Displays information about the model, including details, modelfile, templates, parameters, licenses, and system hints.
parameters
name
: Name of the model to be displayedverbose
(Optional): If set totrue
, then returns the full data for the Detailed Response field
Example Request
curl http://localhost:11434/api/show -d '{
"name": "llama3.1"
}'
Example Response
{
"license":"...",
"modelfile":"...",
"parameters":"...",
"template":"...",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"model_info":{
"general.architecture":"llama",
"general.basename":"Meta-Llama-3.1",
"general.file_type":2,
"general.finetune":"Instruct",
"general.languages":["en","de","fr","it","pt","hi","es","th"],
"general.license":"llama3.1",
"general.parameter_count":8030261312,
"general.quantization_version":2,
"general.size_label":"8B",
"general.tags":["facebook","meta","pytorch","llama","llama-3","text-generation"],
"general.type":"model",
"llama.attention.head_count":32,
"llama.attention.head_count_kv":8,
"llama.attention.layer_norm_rms_epsilon":0.00001,
"llama.block_count":32,
"llama.context_length":131072,
"llama.embedding_length":4096,
"llama.feed_forward_length":14336,
"llama.rope.dimension_count":128,
"llama.rope.freq_base":500000,
"llama.vocab_size":128256,
"tokenizer.ggml.bos_token_id":128000,
"tokenizer.ggml.eos_token_id":128009,
"tokenizer.ggml.merges":null,
"tokenizer.ggml.model":"gpt2",
"tokenizer.ggml.pre":"llama-bpe",
"tokenizer.ggml.token_type":null,
"tokenizer.ggml.tokens":null
},
"modified_at":"2024-08-07T17:54:22.533937636+08:00"
}
IX. Pulling models
POST /api/pull
surname Cong ollama
Library download model. An interrupted pull operation will continue the download from the breakpoint, and multiple calls will share the same download progress.
parameters
name
: Name of the model to pullinsecure
(Optional): allows unsafe connections to libraries. It is recommended to use this option only when pulling from your own libraries during development.stream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.
Example Request
curl http://localhost:11434/api/pull -d '{
"name": "llama3.1"
}'
Example Response
in the event that stream
Not specified or set to true
, then a string of JSON objects is returned:
The first object is the list:
{
"status": "pulling manifest"
}
Then there is a series of download responses. Until the download is complete, it may not contain completed
key. The number of files to download depends on the number of layers specified in the list.
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}
The final response after all the files have been downloaded is:
{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}
in the event that stream
set to false, the response is a single JSON object:
{
"status": "success"
}
X. Push models
POST /api/push
Upload the model to the model repository. You need to register ollama.ai and add a public key first.
parameters
name
: the name of the model to be pushed, in the format of<namespace>/<model>:<tag>
insecure
(Optional): Allow insecure connections to libraries. Use this option only when pushing to your own libraries during development.stream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.
Example Request
curl http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'
Example Response
in the event that stream
Not specified or set to true
, then a string of JSON objects is returned:
{ "status": "retrieving manifest" }
Then there is a series of upload responses:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Finally, when the upload is complete:
{"status":"pushing manifest"}
{"status":"success"}
in the event that stream
set to false
The response is a single JSON object:
{ "status": "success" }
XI. Generating embeds
POST /api/embed
Generating embeddings from models.
parameters
model
: Name of the model from which the embedding is to be generatedinput
: To generate embedded text or a list of text
Advanced Parameters:
truncate
: Truncate the end of each input to fit the context length. If the value forfalse
and exceeds the context length, an error is returned. The default value istrue
options
: Other model parameters such astemperature
,seed
et al. (and other authors)keep_alive
: Controls how long the model remains loaded in memory after a request (default:5m
)
Example Request
curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": "为什么草是绿的?"
}'
Example Response
{
"model":"llama3.1",
"embeddings":[[
-0.008059342,-0.013182715,0.019781841,0.012018124,-0.024847334,
-0.0031902494,-0.02714767,0.015282277,0.060032737,...
]],
"total_duration":3041671009,
"load_duration":2864335471,
"prompt_eval_count":7}
Example request (multiple inputs)
curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": ["为什么草是绿的?","为什么天是蓝的?"]
}'
Example Response
{
"model":"llama3.1",
"embeddings":[[
-0.008471201,-0.013031566,0.019300476,0.011618419,-0.025197424,
-0.0024164673,-0.02669075,0.015766116,0.059984162,...
],[
-0.012765694,-0.012822924,0.015915949,0.006415892,-0.02327763,
0.004859615,-0.017922137,0.019488193,0.05638235,...
]],
"total_duration":195481419,
"load_duration":1318886,
"prompt_eval_count":14
}
error handling
The Ollama API returns appropriate error codes and messages when an error occurs. Common errors include:
- 400 Bad Request: request format error.
- 404 Not Found: The requested resource does not exist.
- 500 Internal Server Error: internal server error.