Using the Ollama API in Python

AI hands-on tutorials5mos agoupdate AI Sharing Circle

1.1K 00

In this article, we'll briefly explain how to use the Ollama Whether you want to have a simple chat conversation, work with big data using streaming responses, or want to do model creation, copying, deletion, etc. locally, this article can guide you. In addition, we show how to use custom clients and asynchronous programming to optimize your application's performance. Whether you're new to Ollama or an experienced developer, this article can help you use the Ollama API more efficiently in Python.

This tutorial also provides a Jupyter Notebook Examples to get better.

environmental preparation

Before you start using Python to interact with the Ollama API, make sure your development environment meets the following conditions:

Python: Install Python 3.8 or later.
pip: Make sure you have pip, the Python package management tool, installed.
ollama Library: Used to make it easier to interact with the Ollama API. The installation command is as follows:

pip install ollama

Usage

from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='llama3.1', messages=[
{
'role': 'user',
'content': '为什么天空是蓝色的？',
},
])
print(response['message']['content'])
print(response.message.content)

Streaming Response

This can be done by setting the stream=True Enable response streaming so that the function call returns a Python generator where each part is an object in the stream.

from ollama import chat
stream = chat(
model='llama3.1',
messages=[{'role': 'user', 'content': '为什么天空是蓝色的？'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)

Structured Output

Normal Output (Unstructured Output)
- Generate natural language text directly.
- Suitable for human reading, but not easy for programs to parse or automate.
- Example:
```
这是一只黑色的小猫，它正在草地上玩耍。
```
Structured Output
- Return data in JSON, YAML, XML, or other formats to make it easier for machines to parse and use.
- Ideal for APIs, automated workflows and data storage.
- Example:
```
{
"description": "这是一只黑色的小猫",
"activity": "正在草地上玩耍"
}
```

Advantages of Structured Output

(1) Ease of handling

The machine can easily extract specific fields such as description maybe activity The NLP is used to parse ordinary text without the need for NLP.

(2) Improvement of controllability

Structured formats allow developers to precisely control model output and avoid lengthy or unpredictable answers.

For example, when AI generates code:

{
"language": "Python",
"code": "print('Hello, World!')"
}

(3) Easy to store and analyze

Structured data is better suited to be stored in a database for easy querying and analysis.

Example:

{
"date": "2025-01-20",
"summary": "今天的销售额增长了10%。"
}

from pydantic import BaseModel, Field
from ollama import chat
import json
class CountryInfo(BaseModel):
capital: str = Field(..., alias="首都")
number: str = Field(..., alias="人口")
area: str = Field(..., alias="占地面积")  
response = chat(
model='llama3.1',
messages=[{
'role': 'user',
'content': "请介绍美国的首都、人口、占地面积信息，并以 JSON 格式返回。"
}],
format="json",  
options={'temperature': 0}, 
)
response_content = response["message"]["content"]
if not response_content:
raise ValueError("Ollama 返回的 JSON 为空")
json_response = json.loads(response_content)  
print(json_response)
friends_response = CountryInfo.model_validate(json_response)  
print(friends_response)

API

The Ollama Python library provides a rich set of interfaces that simplify interaction with Ollama. These interfaces are designed to be intuitive and easy to integrate, and are intended to help developers invoke and manage models more easily. For a more detailed look at the underlying implementation and complete API endpoint information, we recommend the Ollama API User's GuideThe

chats

ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': '为什么天空是蓝色的？'}])

generating

ollama.generate(model='llama3.1', prompt='为什么天空是蓝色的？')

List of Local Models

ollama.list()

Displaying model information

ollama.show('llama3.1')

Creating Models

modelfile='''
FROM llama3.1
SYSTEM 你是超级马里奥兄弟中的马里奥。
'''
ollama.create(model='example', modelfile=modelfile)

Replication models

ollama.copy('llama3.1', 'user/llama3.1')

Delete Model

ollama.delete('llama3.1')

pull model

ollama.pull('llama3.1')

push model

ollama.push('user/llama3.1')

Generate Embedding

ollama.embeddings(model='llama3.1', prompt='天空是蓝色的因为瑞利散射')

# 批量生成embedding
ollama.embed(model='llama3.1', input=['天空是蓝色的', '草是绿色的'])

step

ollama.ps()

Customized Clients

This can be accomplished by passing the ollama instantiated Client maybe AsyncClient to create a custom client.

Custom clients can be created using the following fields:

host: Ollama host to connect to
timeout:: Request timeout

For all keyword parameters seehttpx.Client.

Synchronization Client

The synchronization client is used (Client) means that when you call client.chat() method, the program will wait for the request to complete and return the results before continuing to execute subsequent code. This approach is more intuitive and simple, and is suitable for writing applications that have a more linear flow and do not need to handle a large number of concurrent tasks.

from ollama import Client
client = Client(
host='http://localhost:11434',
headers={'x-some-header': 'some-value'}
)
response = client.chat(model='llama3.1', messages=[
{
'role': 'user',
'content': '为什么天空是蓝色的？',
},
])
print(response)

Asynchronous Client

This code uses an asynchronous client (AsyncClient) and defines an asynchronous function chat() . With the await keyword, you can pause the execution of this function until the AsyncClient().chat() request completes, but does not block other operations in the meantime. This is useful for efficiently processing I/O operations (e.g., network requests) or for applications that wish to perform multiple tasks simultaneously. In addition, the use of asyncio.run(chat()) to run this asynchronous function.

import asyncio
from ollama import AsyncClient
import nest_asyncio
nest_asyncio.apply()
async def chat():
message = {'role': 'user', 'content': '为什么天空是蓝色的？'}
response = await AsyncClient().chat(model='llama3.1', messages=[message])
print(response)
asyncio.run(chat())

set up stream=True Modifies the function to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient
import nest_asyncio
nest_asyncio.apply()
async def chat():
message = {'role': 'user', 'content': '为什么天空是蓝色的？'}
async for part in await AsyncClient().chat(model='llama3.1', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())

Synchronous & Asynchronous Client Time Consumption Comparison Test with Different Number of Calls

The following piece of code calls the synchronous and asynchronous client repeats respectively test_num Sub-quiz process, comparing the total time needed and the single time, the user can change the following parameters for testing:

test_messages: test data
test_num: number of tests
model_name: test model

import time
import asyncio
from ollama import Client, AsyncClient
import nest_asyncio
# 应用nest_asyncio以支持Jupyter中的异步操作
nest_asyncio.apply()
# 初始化客户端
client = Client(host='http://localhost:11434')
async_client = AsyncClient(host='http://localhost:11434')
# 同步请求处理函数
def request_example(client, model_name, messages):
start_time = time.time()
try:
# 同步请求返回
response = client.chat(model=model_name, messages=messages)
except Exception as e:
print(f"同步请求失败: {e}")
response = None
end_time = time.time()
duration = end_time - start_time
print(f"同步请求时间: {duration}")
return response, duration
# 异步请求处理函数
async def async_request_example(client, model_name, messages):
start_time = time.time()
try:
# 异步请求返回
response = await client.chat(model=model_name, messages=messages)
except Exception as e:
print(f"异步请求失败: {e}")
response = None
end_time = time.time()
duration = end_time - start_time
print(f"异步请求时间: {duration}")
return response, duration
# 异步请求测试函数
async def async_client_test(test_num, model_name, messages):
tasks = [asyncio.create_task(async_request_example(async_client, model_name, messages)) 
for _ in range(test_num)]
results= await asyncio.gather(*tasks)
return results
# 运行同步测试
def sync_test(model_name, messages, test_num):
total_time = 0
for i in range(test_num):
_, duration = request_example(client, model_name, messages)
total_time += duration
return total_time / test_num
# 运行异步测试
async def async_test(model_name, messages, test_num):
start_time = time.time()
await async_client_test(test_num, model_name, messages)
end_time = time.time()
return (end_time - start_time) / test_num
# 准备测试数据
test_messages = [{'role': 'user', 'content': '为什么天空是蓝色的？'}]
test_num = 10
model_name = 'llama3.1'
# 运行同步测试并输出结果
print("运行同步测试")
sync_avg_time = sync_test(model_name, test_messages, test_num)
print(f"同步测试平均时间: {sync_avg_time:.2f} 秒")
# 运行异步测试并输出结果
print("运行异步测试")
async_avg_time = asyncio.run(async_test(model_name, test_messages, test_num))
print(f"异步测试平均时间: {async_avg_time:.2f} 秒")

incorrect

An error is raised if the request returns an error status or an error is detected while streaming.

import ollama
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('错误:', e.error)
if e.status_code == 404:
ollama.pull(model)

Refer to the documentation:Ollama python

AI hands-on tutorials

The article is copyrighted and should not be reproduced without permission.

How Manus Redefines the Universal Agent: An In-Depth Look at Its Workings and Interaction Designs

AI hands-on tutorials

5mos ago

01.2K

Creative Drawing in AI for Writers: the Ultimate Guide

AI hands-on tutorials # Code

2yrs ago

01.6K

Dify Integration with the RAGFlow Knowledge Base: A Practical Guide to Enhancing Q&A Effectiveness

AI hands-on tutorials

5mos ago

01.9K

Deploying OneAPI without servers? | OneAPI Deployment User Guide

AI hands-on tutorials

7mos ago

01.9K

No comments

You must be logged in to leave a comment!

No comments...

Using the Ollama API in Python

environmental preparation

Usage

Streaming Response

Structured Output

Advantages of Structured Output

(1) Ease of handling

(2) Improvement of controllability

(3) Easy to store and analyze

API

chats

generating

List of Local Models

Displaying model information

Creating Models

Replication models

Delete Model

pull model

push model

Generate Embedding

step

Customized Clients

Synchronization Client

Asynchronous Client

Synchronous & Asynchronous Client Time Consumption Comparison Test with Different Number of Calls

incorrect

Take a tour of Gemini 2.0 Flash's native image generation and editing capabilities.

Using the Ollama API in Java

Related posts

How Manus Redefines the Universal Agent: An In-Depth Look at Its Workings and Interaction Designs

Creative Drawing in AI for Writers: the Ultimate Guide

Dify Integration with the RAGFlow Knowledge Base: A Practical Guide to Enhancing Q&A Effectiveness

Deploying OneAPI without servers? | OneAPI Deployment User Guide

No comments

Latest Collections

Latest Articles

Using the Ollama API in Python

environmental preparation

Usage

Streaming Response

Structured Output

Advantages of Structured Output

(1) Ease of handling

(2) Improvement of controllability

(3) Easy to store and analyze

API

chats

generating

List of Local Models

Displaying model information

Creating Models

Replication models

Delete Model

pull model

push model

Generate Embedding

step

Customized Clients

Synchronization Client

Asynchronous Client

Synchronous & Asynchronous Client Time Consumption Comparison Test with Different Number of Calls

incorrect

Take a tour of Gemini 2.0 Flash's native image generation and editing capabilities.

Using the Ollama API in Java

Related posts

How Manus Redefines the Universal Agent: An In-Depth Look at Its Workings and Interaction Designs

Creative Drawing in AI for Writers: the Ultimate Guide

Dify Integration with the RAGFlow Knowledge Base: A Practical Guide to Enhancing Q&A Effectiveness

Deploying OneAPI without servers? | OneAPI Deployment User Guide

No comments

Selected AI Tools

Latest Collections

Latest Articles