summary
Ollama A powerful REST API is provided to enable developers to easily interact with large language models. Through the Ollama API, users can send requests and receive responses generated by the model, which are applied to tasks such as natural language processing and text generation. In this paper, we will introduce the basic operations of generating complements and dialog generation in detail, and common operations such as creating models, copying models, and deleting models are also explained.
starting point or ending point (in stories etc)
Answer Completion \ Dialog Completion \ Create Model \ Copy Model \ Delete Model \ List Running Models \ List Local Models \ Show Model Information \ Pull Models \ Push Models \ Generate Embedding
I. Answer Completion
POST /api/generate
Generates a response to a given prompt using the specified model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and other data from the request.
parameters
model
: (Required) Model nameprompt
: Tip to generate a responsesuffix
: Text after model responseimages
: (Optional) a list of base64-encoded images (for multimodal models such asllava
)
Advanced Parameters (optional):
format
: Returns the format of the response. The only values currently accepted arejson
options
:: Other model parameters, such astemperature
,seed
et al. (and other authors)system
: System Messagestemplate
: The prompt template to usecontext
:: From the previous review of/generate
The contextual parameters returned in the request can be used to keep a short dialog memorystream
: If set tofalse
The response will be returned as a single response object rather than a stream of objects.raw
: If set totrue
, there will be no formatting of the prompt. If you specify a full template prompt when requesting the API, you can optionally use theraw
parameterskeep_alive
: Controls how long the model remains in memory after a request (default:5m
)
Example request (streaming)
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "prompt".
"prompt": "Why is the grass green?"
}'
Tip.
If you use the curl
command, please download curl for WindowsTo add environment variables, extract the file, then locate the bin subfile in the directory where the file is located, and copy the address of the file to add the environment variables.
Use the following command in a command line window (not PowerShell, mind you) to check if it was successfully added.
curl --help
The following is displayed to indicate successful addition.
Tip.
In a Windows command line window, use the curl
When requesting commands, note the use of escaped double quotes. Example commands are listed below.
curl http://localhost:11434/api/generate -d "{\"model\": \"llama3.1\", \"prompt\": \"Why the grass is green\"}"
The following display indicates that the request was successful.
Example Response
The return is a stream of JSON objects:
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T02:54:08.184732629Z",
"response": "plant",.
"done":false
}
The final response in the stream also includes additional data about the generation:
- context: the dialog code used for this response, which can be sent in the next request to keep the dialog memorized
- total_duration: time spent generating the response (in nanoseconds)
- load_duration: time taken to load the model (in nanoseconds)
- prompt_eval_count: number of tokens in the prompt
- prompt_eval_duration: time taken to evaluate the prompt (in nanoseconds)
- eval_count: number of tokens in the response
- eval_duration: time taken to generate the response (in nanoseconds)
- response: null if the response is streamed, if not, this will contain the full response To calculate the response generation rate (number of tokens generated per second, token/s), i.e.
eval_count
/eval_duration
* 10^9.
Final Response:
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T02:54:10.819603411Z",
"response":"",
"done":true, "done_reason".
"context":[1,2,3], "total_duration":86,3
"total_duration":8655401792,
"prompt_eval_count":17, "prompt_eval_duration":8655401792, "load_duration":5924129727,
"prompt_eval_duration":29196000,
"eval_count":118, "prompt_eval_duration":26196000, "prompt_eval_duration":26196000
"eval_duration":2656329000
}
Advanced Play
non-streaming output
commander-in-chief (military) stream
set to false
that can receive all responses at once.
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "prompt".
"prompt": "Why is the grass green?" ,
"stream": false
}'
Example Response
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T07:13:34.418567351Z",
"response": "Answer: leaves contain a lot of chlorophyll." ,
"done":true,.
"done_reason": "stop",
"context":[1,2,3],
"total_duration":2902435095,
"load_duration":2605831520, "prompt_eval_count
"prompt_eval_count":17, "prompt_eval_duration":2902435095, "load_duration":2605831520,
"prompt_eval_duration":29322000, "prompt_eval_count":13
"eval_count":13,
"eval_duration":266499000
}
JSON mode
(coll.) fail (a student) format
set to json
The output will be in JSON format. Note, however, that the prompt
The model is instructed to respond in JSON format, otherwise the model may generate a lot of spaces.
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "prompt": "Why is the grass green?
"prompt": "Why is the grass green? Output the answer in JSON format",
"format": "json",
"stream": false
}'
Example Response
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T07:21:24.950883454Z",
"response":"{\n \"Cause of color\": \"Leaves contain chlorophyll needed for photosynthesis\",\n \n \"Role\": \"To photosynthesize and absorb solar energy\"\n}", "true", "created_at": "2024-08-08T07:24.50883454Z", "model".
"done":true, "done_reason".
"done_reason": "stop",
"context":[1,2,3],
"total_duration":3492279981,
"prompt_eval_duration":28804000,
"eval_count":40, "prompt_eval_duration":8500000
"eval_duration":851206000
}
response
will be a string containing JSON similar to the following:
{
"Cause of color": "Leaves contain chlorophyll, which is needed for photosynthesis.", "color": "Leaves contain chlorophyll, which is needed for photosynthesis.", "color".
"Function": "Performs photosynthesis to absorb solar energy."
}
Input contains images
To add a new model to a multimodal model (e.g. llava
maybe bakllava
To submit an image, please provide a base64-encoded version of the images
List:
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llava", "prompt".
"prompt": "Describe this image",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+ VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+ NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/ XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+ dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+ pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz /6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+ sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/ atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/ nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/ oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+ yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrrJCIW98pzvxpAWyyo3HYwqS0+ H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+ VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/ dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/ 8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+ TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+ oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+ q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/ dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM /nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/ 7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+ TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/ l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/ 9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+ 0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+ LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO +L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/ A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA +F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/ jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/ lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+ q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/ oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+. vZToUQgzhkHXudb/ PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX +b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/ X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+ NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/ UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2 /FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/ f7mgoAbYkAAAAAElFTkSuQmCC"
]
}'
Example Response
{
"model": "llava", "created_at".
"created_at": "2024-08-08T07:33:55.481713465Z", "response":" The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face.
"response":" The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face. It appears to be in motion, indicated by the lines extending from its arms and tail, giving a picture of the animal's face. The image shows a cartoon of an animated character that resembles a cute pig with large eyes and a smiling face. It appears to be in motion, indicated by the lines extending from its arms and tail, giving it a dynamic feel as if it is waving or dancing. The style of the image is playful and simplistic, typical of line art or stickers. The character's design has been stylized with exaggerated features such as large ears and a smiling expression, which adds to its charm. The character's design has been stylized with exaggerated features such as large ears and a smiling expression, which adds to its charm.
"done":true, "done_reason": "done_reason":true
"done_reason": "stop", "context":[1,2,4
"context":[1,2,3], "total_duration":1,2,3
"total_duration":2960501550, "load_duration":4550, "load_duration":4500
"prompt_eval_duration":758437000,
"eval_count":108, "prompt_eval_duration":21550012, "prompt_eval_count":1, "prompt_eval_duration":758437000,
"eval_duration":2148818000
}
Reproducible output
commander-in-chief (military) seed
Set to a fixed value to get reproducible output:
Example Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1", "prompt".
"prompt": "Why is the grass green?" ,
"stream": false, "options": { "stream": false
"options": {
"seed": 1001
}
}'
Example Response
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T07:42:28.397780058Z",
"response": "Answer: because leaves contain a lot of chloride ions." ,
"done":true,.
"done_reason": "stop",
"context":[1,2,3],
"total_duration":404791556,
"load_duration":18317351, "prompt_eval_count".
"prompt_eval_count":17, "prompt_eval_count":17, "prompt_eval_count":17
"prompt_eval_duration":22453000,
"eval_count":16, "prompt_eval_duration":18317351, "prompt_eval_count":17, "prompt_eval_duration":22453000,
"eval_duration":321267000}
II. Dialogue Completion
POST /api/chat
Generate the next message in the chat using the specified model. This is also a streaming endpoint, so there will be a series of responses. If the "stream"
set to false
, then streaming can be disabled. The final response object will include the requested statistics and additional data.
parameters
model
: (Required) Model namemessages
: Chat messages, which can be used to keep a memory of the chattools
: The model supports the use of tools. It is necessary to integrate thestream
set tofalse
message
The object has the following fields:
role
: The role of the message, which can besystem
,user
,assistant
maybetool
content
: The content of the messageimages
(optional): a list of images to be included in the message (for messages such asllava
(Multimodal models such as these)tool_calls
(optional): list of tools the model wants to use
Advanced Parameters (optional):
format
: Returns the format of the response. The only currently accepted values arejson
options
: Other model parameters such astemperature
,seed
et al. (and other authors)stream
: If the value forfalse
The response will be returned as a single response object rather than a stream of objects.keep_alive
: Controls how long the model remains loaded in memory after a request (default:5m
)
Example request (streaming)
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user", "content": "Why is the grass green?
"content": "Why is the grass green?"
}
]
}'
Example Response
Returns a stream of JSON objects:
{
"model": "llama3.1", "created_at": "2024-08-08T03:54:36.933701041Z", {
"created_at": "2024-08-08T03:54:36.933701041Z",
"message":{
"role": "assistant", "content":{
"content": "because"
},
"done":false
}
Final Response:
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T03:54:37.187621765Z",
"message":{
"role": "assistant",
"content":""
},
"done_reason": "stop", "done":true, "content":" }, "done":true
"total_duration":5730533217,
"load_duration":5370535786, "prompt_eval_count":1717
"prompt_eval_count":17, "prompt_eval_duration":5730533217,
"prompt_eval_duration":29621000,
"eval_count":13, "prompt_eval_duration":29621000, "prompt_eval_duration":29621000
"eval_duration":273810000
}
Advanced Play
Parameterization of non-streaming output, JSON mode, multimodal input, reproducible output and Answer API
of consistency.
With history
Send chat messages with conversation history. Multiple rounds of conversations or thought chain prompts can be started using the same method.
Example Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user", "content": "Why is grass green?
"content": "Why is the grass green?"
},
{
"role": "assistant", "content": "Because grass contains chlorophyll.
"content": "Because grass contains chlorophyll."
}, { "role": "assistant", "content": "Because grass contains chlorophyll.
{
"role": "user", "content": "Why does chlorophyll make grass look green?" }, {
"content": "Why does chlorophyll make grass look green?"
}
], { "role": "user", "content": "Why does chlorophyll make grass look green?
"stream": false
}'
Example Response
{
"model": "llama3.1", "created_at".
"created_at": "2024-08-08T07:53:28.849517802Z",
"message":{
"role": "assistant", "content":{
"content": "It's a more complex problem! \n\nChlorophylls are pigments called xanthophylls, and these pigments absorb light energy. In daylight, the chlorophyll in green grass leaves absorbs blue and red light, but reflects yellow and green light, so we see the grass look green. \n\nSimply put, chlorophyll makes grass look green because it reflects the green light that our eyes can see without reflecting the other colors we see."
},.
"done_reason": "stop".
"done":true,.
"total_duration":5065572138,
"load_duration":2613559070,
"prompt_eval_duration":37825000,
"eval_count":106, "eval_duration":2213559070, "prompt_eval_count":48, "prompt_eval_duration":37825000,
"eval_duration":2266694000}
III. Creating models
POST /api/create
recommended general modelfile
set to the contents of the Modelfile, rather than just setting the path
Remote Model Creation. Remote model creation must also use Create Blob to explicitly create all file blobs, fields (such as the FROM
cap (a poem) ADAPTER
) and set the value to the path indicated in the response.
parameters
name
: Name of the model to be createdmodelfile
(optional): contents of the Modelfilestream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.path
(optional): path to the Modelfile
Example Request
curl http://localhost:11434/api/create -d '{
"name": "mario", "modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros.
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'
Example Response
A string of JSON objects. Notice that the final JSON object shows "status": "success"
Prompted to create successfully.
{"status": "Reading model metadata"}
{"status": "creating system layer"}
{"status": "using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status": "using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status": "using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status": "using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status": "using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status": "writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status": "writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status": "Writing manifest"}
{"status": "success"}
Check if the Blob exists
HEAD /api/blobs/:digest
Make sure that the file blob for the FROM or ADAPTER field exists on the server. This is checking your Ollama server and not Ollama.ai.
Query parameters
digest
: SHA256 digest of blob
Example Request
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Example Response
Returns "200 OK" if the blob exists, or "404 Not Found" if it does not.
Creating a Blob
POST /api/blobs/:digest
Creates a blob from a file on the server. returns the server file path.
Query parameters
digest
: Expected SHA256 summary of the document
Example Request
curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Example Response
Returns 201 Created if the blob was successfully created, or 400 Bad Request if the digest used was not as expected.
IV. Replication model
POST /api/copy
Duplicate a model to duplicate an existing model using another name.
Example Request
curl http://localhost:11434/api/copy -d '{
"source": "llama3.1", "destination": "llama3-backup
"destination": "llama3-backup"
}'
Example Response
Returns "200 OK" if successful, or "404 Not Found" if the source model does not exist.
V. Deletion of models
DELETE /api/delete
Delete the model and its data.
parameters
name
: Name of the model to be deleted
Example Request
[](https://github.com/datawhalechina/handy-ollama/blob/main/docs/C4/1.1 TP3T20OllamaAPI. md#-4)curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama3.1"
}'
Example Response
Returns "200 OK" if successful or "404 Not Found" if the model to be deleted does not exist.
VI. Listing of operational models
GET /api/ps
Lists the models currently loaded into memory.
Example Request
curl http://localhost:11434/api/ps
Example Response
{
"models":[
{
"name": "llama3.1:latest",
"model": "llama3.1:latest",
"size":6654289920, "digest": "7538208d99dfaa6".
"digest": "75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",.
"format": "gguf".
"family": "llama".
"families":["llama"], "parameter_size".
"parameter_size": "8.0B", "parameter_size": "8.0B", "parameter_size".
"quantization_level": "Q4_0"
},
"expires_at": "2024-08-08T14:06:52.883023476+08:00",
"size_vram":6654289920
}
]
}
VII. Listing of local models
GET /api/tags
Lists locally available models.
Example Request
curl http://localhost:11434/api/tags
Example Response
{
"models":[
{
"name": "llama3.1:latest",
"model": "llama3.1:latest",
"modified_at": "2024-08-07T17:54:22.533937636+08:00",
"size":4661230977,
"digest": "75382d0899dfaaa6ce331cf680b72bd6812c7f05e5158c5f2f43c6383e21d734",
"details":{
"parent_model":"",.
"format": "gguf".
"family": "llama".
"families":["llama"], "parameter_size".
"parameter_size": "8.0B", "parameter_size": "8.0B", "parameter_size".
"quantization_level": "Q4_0"
}
}
]
}
VIII. Display of model information
POST /api/show
Displays information about the model, including details, modelfile, templates, parameters, licenses, and system hints.
parameters
name
: Name of the model to be displayedverbose
(Optional): If set totrue
, then returns the full data for the Detailed Response field
Example Request
curl http://localhost:11434/api/show -d '{
"name": "llama3.1"
}'
Example Response
{
"license":"..." ,
"modelfile":"..." , "parameters":"..." , "modelfile":"...
"parameters":"..." ,
"template":"..." ,
"details":{
"parent_model":"", "", "format":"..." , "details":{
"format": "gguf".
"family": "llama".
"families":["llama"], "parameter_size".
"parameter_size": "8.0B", "parameter_size": "8.0B", "parameter_size".
"quantization_level": "Q4_0"
}, "parameter_size":["8.0B"], "quantization_level": "Q4_0
"model_info":{
"general.architecture": "llama",
"general.basename": "Meta-Llama-3.1",
"general.file_type":2, "general.
"general.languages":["en", "de", "fr", "it", "pt", "hi", "es", "th"],
"general.license": "llama3.1", "general.
"general.parameter_count":8030261312, "general.quantization_version".
"general.quantization_version":2, "general.size_label".
"general.size_label": "8B",
"general.tags":["facebook", "meta", "pytorch", "llama", "llama-3", "text-generation"], "general.
"general.type": "model".
"llama.attention.head_count":32, "llama.attention.head_count":32, "llama.attention.head_count".
"llama.attention.head_count_kv":8,
"llama.attention.layer_norm_rms_epsilon":0.00001,
"llama.block_count":32,
"llama.context_length":131072,
"llama.embedding_length":4096,
"llama.feed_forward_length":14336,
"llama.rope.dimension_count":128,
"llama.rope.freq_base":500000,
"llama.vocab_size":128256,
"tokenizer.ggml.bos_token_id":128000,
"tokenizer.ggml.eos_token_id":128009,
"tokenizer.ggml.merges":null,
"tokenizer.ggml.model": "gpt2",
"tokenizer.ggml.pre": "llama-bpe",
"tokenizer.ggml.token_type":null, "tokenizer.ggml.
"tokenizer.ggml.tokens":null
}, "tokenizer.ggml.tokens":null
"modified_at": "2024-08-07T17:54:22.533937636+08:00"
}
IX. Pulling models
POST /api/pull
surname Cong ollama
Library download model. An interrupted pull operation will continue the download from the breakpoint, and multiple calls will share the same download progress.
parameters
name
: Name of the model to pullinsecure
(Optional): allows unsafe connections to libraries. It is recommended to use this option only when pulling from your own libraries during development.stream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.
Example Request
curl http://localhost:11434/api/pull -d '{
"name": "llama3.1"
}'
Example Response
in the event that stream
Not specified or set to true
, then a string of JSON objects is returned:
The first object is the list:
{
"status": "pulling manifest"
}
Then there is a series of download responses. Until the download is complete, it may not contain completed
key. The number of files to download depends on the number of layers specified in the list.
{
"status": "downloading digestname", "digest".
"digest": "digestname", "total": 2142590208, "digest".
"total": 2142590208, "completed": 241970
"completed": 241970
}
The final response after all the files have been downloaded is:
{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}
in the event that stream
set to false, the response is a single JSON object:
{
"status": "success"
}
X. Push models
POST /api/push
Upload the model to the model repository. You need to register ollama.ai and add a public key first.
parameters
name
: the name of the model to be pushed, in the format of/:
insecure
(Optional): Allow insecure connections to libraries. Use this option only when pushing to your own libraries during development.stream
(optional): if the value forfalse
The response will be returned as a single response object, not a stream of objects.
Example Request
curl http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'
Example Response
in the event that stream
Not specified or set to true
, then a string of JSON objects is returned:
{ "status": "retrieving manifest" }
Then there is a series of upload responses:
{
"status": "starting upload", "digest".
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Finally, when the upload is complete:
{"status": "pushing manifest"}
{"status": "success"}
in the event that stream
set to false
The response is a single JSON object:
{ "status": "success" }
XI. Generating embeds
POST /api/embed
Generating embeddings from models.
parameters
model
: Name of the model from which the embedding is to be generatedinput
: To generate embedded text or a list of text
Advanced Parameters:
truncate
: Truncate the end of each input to fit the context length. If the value forfalse
and exceeds the context length, an error is returned. The default value istrue
options
: Other model parameters such astemperature
,seed
et al. (and other authors)keep_alive
: Controls how long the model remains loaded in memory after a request (default:5m
)
Example Request
curl http://localhost:11434/api/embed -d '{
"model": "llama 3.1", "input": "Why is the grass green?
"input": "Why is the grass green?"
}'
Example Response
{
"model": "llama3.1",
"embeddings":[[
-0.008059342,-0.013182715,0.019781841,0.012018124,-0.024847334.
-0.0031902494,-0.02714767,0.015282277,0.060032737,...
]]
"total_duration":3041671009,
"prompt_eval_count":7}
Example request (multiple inputs)
curl http://localhost:11434/api/embed -d '{
"model": "llama3.1",
"input": ["Why is the grass green?" , "Why is the sky blue?"]
}'
Example Response
{
"model": "llama3.1",
"embeddings":[[
-0.008471201,-0.013031566,0.019300476,0.011618419,-0.025197424.
-0.0024164673,-0.02669075,0.015766116,0.059984162,...
],[
-0.012765694,-0.012822924,0.015915949,0.006415892,-0.02327763,...
0.004859615,-0.017922137,0.019488193,0.05638235,...
]]
"total_duration":195481419,
"load_duration":1318886, "prompt_eval_count":0.017922137,0.019488193,0.05638235.
"prompt_eval_count":14
}
error handling
The Ollama API returns appropriate error codes and messages when an error occurs. Common errors include:
- 400 Bad Request: request format error.
- 404 Not Found: The requested resource does not exist.
- 500 Internal Server Error: internal server error.