Ollama API User's Guide

AI hands-on tutorials5mos agorelease AI Sharing Circle

1.1K 00

summary

Ollama A powerful REST API is provided to enable developers to easily interact with large language models. Through the Ollama API, users can send requests and receive responses generated by the model, which are applied to tasks such as natural language processing and text generation. In this paper, we will introduce the basic operations of generating complements and dialog generation in detail, and common operations such as creating models, copying models, and deleting models are also explained.

starting point or ending point (in stories etc)

Answer Completion \ Dialog Completion \ Create Model \ Copy Model \ Delete Model \ List Running Models \ List Local Models \ Show Model Information \ Pull Models \ Push Models \ Generate Embedding

I. Answer Completion

POST /api/generate

Generates a response to a given prompt using the specified model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and other data from the request.

parameters

model: (Required) Model name
prompt: Tip to generate a response
suffix: Text after model response
images: (Optional) a list of base64-encoded images (for multimodal models such as llava )

Advanced Parameters (optional):

format: Returns the format of the response. The only values currently accepted are json
options:: Other model parameters, such as temperature,seed et al. (and other authors)
system: System Messages
template: The prompt template to use
context:: From the previous review of /generate The contextual parameters returned in the request can be used to keep a short dialog memory
stream: If set to false The response will be returned as a single response object rather than a stream of objects.
raw: If set to true , there will be no formatting of the prompt. If you specify a full template prompt when requesting the API, you can optionally use the raw parameters
keep_alive: Controls how long the model remains in memory after a request (default:5m)

Example request (streaming)

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的？"
}'

Tip.

If you use the curl command, please download curl for WindowsTo add environment variables, extract the file, then locate the bin subfile in the directory where the file is located, and copy the address of the file to add the environment variables.

Use the following command in a command line window (not PowerShell, mind you) to check if it was successfully added.

curl --help

The following is displayed to indicate successful addition.

Tip.

In a Windows command line window, use the curl When requesting commands, note the use of escaped double quotes. Example commands are listed below.

curl http://localhost:11434/api/generate -d "{\"model\": \"llama3.1\", \"prompt\": \"为什么草是绿的\"}"

The following display indicates that the request was successful.

Example Response

The return is a stream of JSON objects:

{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:08.184732629Z",
"response":"植物",
"done":false
}

The final response in the stream also includes additional data about the generation:

context: the dialog code used for this response, which can be sent in the next request to keep the dialog memorized
total_duration: time spent generating the response (in nanoseconds)
load_duration: time taken to load the model (in nanoseconds)
prompt_eval_count: number of tokens in the prompt
prompt_eval_duration: time taken to evaluate the prompt (in nanoseconds)
eval_count: number of tokens in the response
eval_duration: time taken to generate the response (in nanoseconds)
response: null if the response is streamed, if not, this will contain the full response To calculate the response generation rate (number of tokens generated per second, token/s), i.e.eval_count / eval_duration * 10^9.

Final Response:

{
"model":"llama3.1",
"created_at":"2024-08-08T02:54:10.819603411Z",
"response":"",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":8655401792,
"load_duration":5924129727,
"prompt_eval_count":17,
"prompt_eval_duration":29196000,
"eval_count":118,
"eval_duration":2656329000
}

Advanced Play

non-streaming output

commander-in-chief (military) stream set to falsethat can receive all responses at once.

Example Request

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的？",
"stream": false
}'

Example Response

{
"model":"llama3.1",
"created_at":"2024-08-08T07:13:34.418567351Z",
"response":"答案：叶子含有大量的叶绿素。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2902435095,
"load_duration":2605831520,
"prompt_eval_count":17,
"prompt_eval_duration":29322000,
"eval_count":13,
"eval_duration":266499000
}

JSON mode

(coll.) fail (a student) format set to json The output will be in JSON format. Note, however, that the prompt The model is instructed to respond in JSON format, otherwise the model may generate a lot of spaces.

Example Request

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的？以JSON格式输出答案",
"format": "json",
"stream": false
}'

Example Response

{
"model":"llama3.1",
"created_at":"2024-08-08T07:21:24.950883454Z",
"response":"{\n  \"颜色原因\": \"叶子中含有光合作用所需的叶绿素\",\n  \"作用\": \"进行光合作用吸收太阳能\"\n}",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":3492279981,
"load_duration":2610591203,
"prompt_eval_count":22,
"prompt_eval_duration":28804000,
"eval_count":40,
"eval_duration":851206000
}

response will be a string containing JSON similar to the following:

{
"颜色原因": "叶子中含有光合作用所需的叶绿素",
"作用": "进行光合作用吸收太阳能"
}

Input contains images

To add a new model to a multimodal model (e.g. llava maybe bakllavaTo submit an image, please provide a base64-encoded version of the images List:

Example Request

curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"描述这张图片",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"
]
}'

Example Response

{
"model":"llava",
"created_at":"2024-08-08T07:33:55.481713465Z",
"response":" The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face. It appears to be in motion, indicated by the lines extending from its arms and tail, giving it a dynamic feel as if it is waving or dancing. The style of the image is playful and simplistic, typical of line art or stickers. The character's design has been stylized with exaggerated features such as large ears and a smiling expression, which adds to its charm. ",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":2960501550,
"load_duration":4566012,
"prompt_eval_count":1,
"prompt_eval_duration":758437000,
"eval_count":108,
"eval_duration":2148818000
}

Reproducible output

commander-in-chief (military) seed Set to a fixed value to get reproducible output:

Example Request

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "为什么草是绿的？",
"stream": false,
"options": {
"seed": 1001
}
}'

Example Response

{
"model":"llama3.1",
"created_at":"2024-08-08T07:42:28.397780058Z",
"response":"答案：因为叶子中含有大量的氯离子。",
"done":true,
"done_reason":"stop",
"context":[1,2,3],
"total_duration":404791556,
"load_duration":18317351,
"prompt_eval_count":17,
"prompt_eval_duration":22453000,
"eval_count":16,
"eval_duration":321267000}

II. Dialogue Completion

POST /api/chat

Generate the next message in the chat using the specified model. This is also a streaming endpoint, so there will be a series of responses. If the "stream" set to false, then streaming can be disabled. The final response object will include the requested statistics and additional data.

parameters

model: (Required) Model name
messages: Chat messages, which can be used to keep a memory of the chat
tools: The model supports the use of tools. It is necessary to integrate the stream set to false

message The object has the following fields:

role: The role of the message, which can be system,user,assistant maybe tool
content: The content of the message
images(optional): a list of images to be included in the message (for messages such as llava (Multimodal models such as these)
tool_calls(optional): list of tools the model wants to use

Advanced Parameters (optional):

format: Returns the format of the response. The only currently accepted values are json
options: Other model parameters such as temperature,seed et al. (and other authors)
stream: If the value for falseThe response will be returned as a single response object rather than a stream of objects.
keep_alive: Controls how long the model remains loaded in memory after a request (default:5m)

Example request (streaming)

curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿的？"
}
]
}'

Example Response

Returns a stream of JSON objects:

{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:36.933701041Z",
"message":{
"role":"assistant",
"content":"因为"
},
"done":false
}

Final Response:

{
"model":"llama3.1",
"created_at":"2024-08-08T03:54:37.187621765Z",
"message":{
"role":"assistant",
"content":""
},
"done_reason":"stop",
"done":true,
"total_duration":5730533217,
"load_duration":5370535786,
"prompt_eval_count":17,
"prompt_eval_duration":29621000,
"eval_count":13,
"eval_duration":273810000
}

Advanced Play

Parameterization of non-streaming output, JSON mode, multimodal input, reproducible output and 回答API of consistency.

With history

Send chat messages with conversation history. Multiple rounds of conversations or thought chain prompts can be started using the same method.

Example Request

curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "为什么草是绿色的？"
},
{
"role": "assistant",
"content": "因为草里面含有叶绿素。"
},
{
"role": "user",
"content": "为什么叶绿素让草看起来是绿色的？"
}
],
"stream": false
}'

Example Response

{
"model":"llama3.1",
"created_at":"2024-08-08T07:53:28.849517802Z",
"message":{
"role":"assistant",
"content":"这是一个更复杂的问题!\n\n叶绿素是一种称为黄素的色素，这些色素可以吸收光能。在日光下，绿色草叶中的叶绿素会吸收蓝光和红光，但反射出黄色和绿色的光，所以我们看到草看起来是绿色的。\n\n简单来说，叶绿素让草看起来是绿色的，因为它反射了我们的眼睛可以看到的绿光，而不反射我们看到的其他颜色。"
},
"done_reason":"stop",
"done":true,
"total_duration":5065572138,
"load_duration":2613559070,
"prompt_eval_count":48,
"prompt_eval_duration":37825000,
"eval_count":106,
"eval_duration":2266694000}

III. Creating models

POST /api/create

recommended general modelfile set to the contents of the Modelfile, rather than just setting the pathRemote Model Creation. Remote model creation must also use Create Blob to explicitly create all file blobs, fields (such as the FROM cap (a poem) ADAPTER) and set the value to the path indicated in the response.

parameters

name: Name of the model to be created
modelfile(optional): contents of the Modelfile
stream(optional): if the value for falseThe response will be returned as a single response object, not a stream of objects.
path(optional): path to the Modelfile

Example Request

curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'

Example Response

A string of JSON objects. Notice that the final JSON object shows "status": "success"Prompted to create successfully.

{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}

Check if the Blob exists

HEAD /api/blobs/:digest

Make sure that the file blob for the FROM or ADAPTER field exists on the server. This is checking your Ollama server and not Ollama.ai.

Query parameters

digest: SHA256 digest of blob

Example Request

curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

Example Response

Returns "200 OK" if the blob exists, or "404 Not Found" if it does not.

Creating a Blob

POST /api/blobs/:digest

Creates a blob from a file on the server. returns the server file path.

Query parameters

digest: Expected SHA256 summary of the document

Example Request

curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

Example Response

Returns 201 Created if the blob was successfully created, or 400 Bad Request if the digest used was not as expected.

IV. Replication model

POST /api/copy

Duplicate a model to duplicate an existing model using another name.

Example Request

curl http://localhost:11434/api/copy -d '{
"source": "llama3.1",
"destination": "llama3-backup"
}'

Example Response

Returns "200 OK" if successful, or "404 Not Found" if the source model does not exist.

V. Deletion of models

DELETE /api/delete

Delete the model and its data.

parameters

name: Name of the model to be deleted

Example Request

[](https://github.com/datawhalechina/handy-ollama/blob/main/docs/C4/1.%20Ollama%20API%20%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97.md#%E7%A4%BA%E4%BE%8B%E8%AF%B7%E6%B1%82-4)

curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama3.1"
}'

Example Response

Returns "200 OK" if successful or "404 Not Found" if the model to be deleted does not exist.

VI. Listing of operational models

GET /api/ps

Lists the models currently loaded into memory.

Example Request

curl http://localhost:11434/api/ps

Example Response

{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"size":6654289920,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"expires_at":"2024-08-08T14:06:52.883023476+08:00",
"size_vram":6654289920
}
]
}

VII. Listing of local models

GET /api/tags

Lists locally available models.

Example Request

curl http://localhost:11434/api/tags

Example Response

{
"models":[
{
"name":"llama3.1:latest",
"model":"llama3.1:latest",
"modified_at":"2024-08-07T17:54:22.533937636+08:00",
"size":4661230977,
"digest":"75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
}
}
]
}

VIII. Display of model information

POST /api/show

Displays information about the model, including details, modelfile, templates, parameters, licenses, and system hints.

parameters

name: Name of the model to be displayed
verbose(Optional): If set to true, then returns the full data for the Detailed Response field

Example Request

curl http://localhost:11434/api/show -d '{
"name": "llama3.1"
}'

Example Response

{
"license":"...",
"modelfile":"...",
"parameters":"...",
"template":"...",
"details":{
"parent_model":"",
"format":"gguf",
"family":"llama",
"families":["llama"],
"parameter_size":"8.0B",
"quantization_level":"Q4_0"
},
"model_info":{
"general.architecture":"llama",
"general.basename":"Meta-Llama-3.1",
"general.file_type":2,
"general.finetune":"Instruct",
"general.languages":["en","de","fr","it","pt","hi","es","th"],
"general.license":"llama3.1",
"general.parameter_count":8030261312,
"general.quantization_version":2,
"general.size_label":"8B",
"general.tags":["facebook","meta","pytorch","llama","llama-3","text-generation"],
"general.type":"model",
"llama.attention.head_count":32,
"llama.attention.head_count_kv":8,
"llama.attention.layer_norm_rms_epsilon":0.00001,
"llama.block_count":32,
"llama.context_length":131072,
"llama.embedding_length":4096,
"llama.feed_forward_length":14336,
"llama.rope.dimension_count":128,
"llama.rope.freq_base":500000,
"llama.vocab_size":128256,
"tokenizer.ggml.bos_token_id":128000,
"tokenizer.ggml.eos_token_id":128009,
"tokenizer.ggml.merges":null,
"tokenizer.ggml.model":"gpt2",
"tokenizer.ggml.pre":"llama-bpe",
"tokenizer.ggml.token_type":null,
"tokenizer.ggml.tokens":null
},
"modified_at":"2024-08-07T17:54:22.533937636+08:00"
}

IX. Pulling models

POST /api/pull

surname Cong ollama Library download model. An interrupted pull operation will continue the download from the breakpoint, and multiple calls will share the same download progress.

parameters

name: Name of the model to pull
insecure(Optional): allows unsafe connections to libraries. It is recommended to use this option only when pulling from your own libraries during development.
stream(optional): if the value for falseThe response will be returned as a single response object, not a stream of objects.

Example Request

curl http://localhost:11434/api/pull -d '{
"name": "llama3.1"
}'

Example Response

in the event that stream Not specified or set to true, then a string of JSON objects is returned:

The first object is the list:

{
"status": "pulling manifest"
}

Then there is a series of download responses. Until the download is complete, it may not contain completed key. The number of files to download depends on the number of layers specified in the list.

{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}

The final response after all the files have been downloaded is:

{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}

in the event that stream set to false, the response is a single JSON object:

{
"status": "success"
}

X. Push models

POST /api/push

Upload the model to the model repository. You need to register ollama.ai and add a public key first.

parameters

name: the name of the model to be pushed, in the format of <namespace>/<model>:<tag>
insecure(Optional): Allow insecure connections to libraries. Use this option only when pushing to your own libraries during development.
stream(optional): if the value for falseThe response will be returned as a single response object, not a stream of objects.

Example Request

curl http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'

Example Response

in the event that stream Not specified or set to true, then a string of JSON objects is returned:

{ "status": "retrieving manifest" }

Then there is a series of upload responses:

{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}

Finally, when the upload is complete:

{"status":"pushing manifest"}
{"status":"success"}

in the event that stream set to falseThe response is a single JSON object:

{ "status": "success" }

XI. Generating embeds

POST /api/embed

Generating embeddings from models.

parameters

model: Name of the model from which the embedding is to be generated
input: To generate embedded text or a list of text

Advanced Parameters:

truncate: Truncate the end of each input to fit the context length. If the value for false and exceeds the context length, an error is returned. The default value is true
options: Other model parameters such as temperature,seed et al. (and other authors)
keep_alive: Controls how long the model remains loaded in memory after a request (default:5m)

Example Request

curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": "为什么草是绿的？"
}'

Example Response

{
"model":"llama3.1",
"embeddings":[[
-0.008059342,-0.013182715,0.019781841,0.012018124,-0.024847334,
-0.0031902494,-0.02714767,0.015282277,0.060032737,...
]],
"total_duration":3041671009,
"load_duration":2864335471,
"prompt_eval_count":7}

Example request (multiple inputs)

curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": ["为什么草是绿的？","为什么天是蓝的？"]
}'

Example Response

{
"model":"llama3.1",
"embeddings":[[
-0.008471201,-0.013031566,0.019300476,0.011618419,-0.025197424,
-0.0024164673,-0.02669075,0.015766116,0.059984162,...
],[
-0.012765694,-0.012822924,0.015915949,0.006415892,-0.02327763,
0.004859615,-0.017922137,0.019488193,0.05638235,...
]],
"total_duration":195481419,
"load_duration":1318886,
"prompt_eval_count":14
}

error handling

The Ollama API returns appropriate error codes and messages when an error occurs. Common errors include:

400 Bad Request: request format error.
404 Not Found: The requested resource does not exist.
500 Internal Server Error: internal server error.

AI hands-on tutorials

The article is copyrighted and should not be reproduced without permission.

借助 Copilot Studio 定制 Microsoft 365 Copilot 智能体

Customize Microsoft 365 Copilot Intelligence with Copilot Studio

AI hands-on tutorials

6mos ago

01.1K

White one-click deployment github all kinds of AI open source projects

AI hands-on tutorials

12mos ago

01.6K

Private Deployment of DeepSeek + Dify: Building a Secure and Controllable Local AI Assistant System

AI hands-on tutorials

6mos ago

01.4K

Recommended RSS feed tools: efficiency tools for AI learners

AI hands-on tutorials

6mos ago

01.4K

No comments

You must be logged in to leave a comment!

No comments...

Ollama API User's Guide

summary

starting point or ending point (in stories etc)

I. Answer Completion

parameters

Example request (streaming)

Example Response

Advanced Play

non-streaming output

JSON mode

Input contains images

Reproducible output

II. Dialogue Completion

parameters

Example request (streaming)

Example Response

Advanced Play

With history

III. Creating models

parameters

Example Request

Example Response

Check if the Blob exists

Query parameters

Example Request

Example Response

Creating a Blob

Query parameters

Example Request

Example Response

IV. Replication model

Example Request

Example Response

V. Deletion of models

parameters

Example Request

Example Response

VI. Listing of operational models

Example Request

Example Response

VII. Listing of local models

Example Request

Example Response

VIII. Display of model information

parameters

Example Request

Example Response

IX. Pulling models

parameters

Example Request

Example Response

X. Push models

parameters

Example Request

Example Response

XI. Generating embeds

parameters

Example Request

Example Response

Example request (multiple inputs)

Example Response

error handling

Ollama customization running in GPUs

Take a tour of Gemini 2.0 Flash's native image generation and editing capabilities.

Related posts

Customize Microsoft 365 Copilot Intelligence with Copilot Studio

White one-click deployment github all kinds of AI open source projects

Private Deployment of DeepSeek + Dify: Building a Secure and Controllable Local AI Assistant System

Recommended RSS feed tools: efficiency tools for AI learners

No comments

Selected AI Tools

Latest Collections

Latest Articles