Unexpectedly, AI has set off a half-changing sky in the programming field. From v0, bolt.new to all kinds of programming tools Cursor, Windsurf combined with Agant, AI Coding already has a huge potential of idea MVP. From the traditional AI-assisted coding, to today's direct project generation behind, in the end how to realize?
This article will look at open source products Cline Starting from there, we'll take a peek at some of the implementation ideas of AI Coding products at this stage. At the same time, you can understand the deeper principles and make better use of AI editors.
The final implementation may not be consistent across AI Coding editors. Also, this article will not go into the implementation details of Tool Use.
Cline I made a sketch of the overall process:
At its core, Cline relies on system prompts and the command following capabilities of the Big Language Model. At the start of a programming task, system prompts, user-defined prompts, user input, and information about the project's environment (which files, open tabs, etc.) are collected and submitted to the LLM, which outputs solutions and actions in accordance with the directives, and to the LLM, which outputs solutions and actions in accordance with the directives, and to the LLM, which outputs solutions and actions in accordance with the directives. ,
Cline calls the written Tool Use capability to perform the processing and passes the results to the LLM for processing. Cline makes multiple calls to LLM to accomplish a single task.
system cue
Cline's system prompts are v0-like prompts, written in Markdown and XML. The LLM Tool Use rules and usage examples are defined in detail:
# Tool Use Formatting
Tool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags.
<tool_name
value1
value2</parameter2_name
...
</tool_name
For example.
src/main.js
</read_file
Always adhere to this format for the tool use to ensure proper parsing and execution.
# Tools
## execute_command
## write_to_file
...
## Example 4: Requesting to use an MCP tool
<use_mcp_tool
weather-server</server_name
get_forecast</tool_name
{
"city": "San Francisco",
"days": 5
}
The MCP server is also injected into the system prompt word.
MCP SERVERS
The Model Context Protocol (MCP) enables communication between the system and locally running MCP servers that provide additional tools and resources to extend your capabilities .
# Connected MCP Servers
....
User commands can also be passed through the .clinerules
injected into the system's system cue word.
From this we can venture to assume that Cursor and WindSurf Injecting .cursorrules is similar
It can be seen that Cline is at its core dependent on the LLM's ability to follow instructions, so the model's temperature is set to 0
The
const stream = await this.client.chat.completions.create({
model: this.options.openAiModelId ?? "",
messages: openAiMessages, // set to 0, // set to 0, // set to 0, // set to 0, // set to 0.
temperature: 0, // set to 0
stream: true, // set to 0
stream_options: { include_usage: true }, })
})
First input
Multiple inputs exist for the user, respectively:
- Directly typed copy with
embody
- pass (a bill or inspection etc)
@
Input file directory, file and url
In Cline.@
There's not a lot of tech involved, for file directories, list the file directory structure; for files, read the contents of the file; and for url, read the contents directly from puppeteer. Then we take the content and user input and output it to LLM.
A sample input is as follows:
Implement a 3D wraparound effect of the solar system 'app/page.tsx' (see below for file content) 'https://stackoverflow.com/questions/23673275/orbital-mechanics-for-a- solar-system-visualization-in-three-js-x-y-z-for-planet' (see below for site content)
</task
import Image from "next/image";
export default function Home() {...}
</file_content
<site_content
...
</site_content
The user input also contains a type of information about the project environment, such as the list of files in the current working directory, the tabs that vscode opens, and so on.
A simple task gives the following input to the LLM:
Implement a 3D wrap around effect for the solar system
</task
<environment_details
# VSCode Visible Files
(No visible files)
# VSCode Open Tabs
(No open tabs)
# Current Time
1/23/2025, 4:41:03 PM (Asia/Shanghai, UTC+8:00)
# Current Working Directory (/Users/nazha/nazha-all/cline-dev/nazha) Files
.gitignore
eslint.config.mjs
next.config.ts
package-lock.json
package.json
postcss.config.mjs
README.md
tailwind.config.ts
tsconfig.json
app/
app/favicon.ico
app/globals.css
app/layout.tsx
app/page.tsx
public/
public/file.svg
public/globe.svg
public/next.svg
public/vercel.svg
public/window.svg
# Current Mode
ACT MODE
</environment_details
As you can see here, other AI Coding editors (such as Cursor)it's possibleEmbedding the codebase, but cline is more brute-force and straightforward.
First round returns #
The LLM is returned as required by the command (temperature is set to 0) and typically contains and operation in two parts. For example:
Analyze the task requirements.
1. need to create a 3D solar system surround effect
2. the current project is a Next.js project (according to next.config.ts and app directory structure)
3. need to add 3D functionality, can use Three.js library
4. need to create a new component to display the 3D effect
Next steps:
1. install the Three.js dependency
2. Create a new component to display the 3D solar system
3. Introduce the component in the page
</thinking
<execute_command
npm install three @types/three
false</requires_approval
</execute_command
In this example, Cline invokes various types of system operations by parsing the instructions output by LLM, including but not limited to:
- execute a command
- Read and write files
- Search Content
- MCP Operation
At the same time, Cline collects information about the operational status of various operations.
Second input
Next, Cline outputs the user's behavior, the output state and the result of the previous operation, including the previous system prompt and user input, to the LLM again, requesting the LLM to give guidance on the next step to be taken. And so on.
role: system
content: [... omitted]
role: user
content: [
{
type: 'text', text: '\nRealizing a 3D solar system wrap-around effect\n</task
text: '\nImplement a 3D surround effect for the solar system\n';
}, {
{
type: 'text', {
text:"... "
}
]
role: 'assistant',, text: "..." } ]
content: '\nAnalysis of task requirements:\n1. need to create a 3D solar system surround effect\n2. current project is a Next.js project (judging by next.config.ts and app directory structure)\n3. need to add 3D functionality which can be used with Three.js library\n4. need to create a new component which can be used with the next.config.ts and app directory structure. n4. Need to create a new component to show the 3D effect\n\nNext Steps:\n1. Install the Three.js dependency\n2. Create a new component to show the 3D solar system\n3. Introduce the component in the page\n\n\n\nnpm install three @types/three\nfalse\n'
role: 'user',
content: [
{
type: 'text',
text: ' [execute_command for 'npm install three @types/three'] Result:'
}, {
{
type: 'text', {
text: 'Command executed.\nOutput:\n⠙⠹⠸⠼⠴⠦ ⠧⠇⠏⠋⠙⠹⠸⠼⠴⠦ ⠧⠇⠏⠋⠙⠹⠸⠼⠴⠦ ⠧⠇⠏⠋⠙⠹⠸⠼⠴⠦ ⠧⠇⠏⠋⠙⠹⠸⠼⠴⠦ ⠧⠇⠏⠋⠙⠹⠸⠼⠴⠦ looking for funding\n⠏ run `npm fund` for details\n⠏\nfound 0 vulnerabilities\n⠏'.
}, {
{
type: 'text', {
content: '\n# VSCode Visible Files\n(No visible files)\n\n# VSCode Open Tabs\n(No open tabs)\n\n# Current Time\n1/23/2025, 10:01:33 PM (Asia/Shanghai, UTC+8:00)\n\n# Current Mode\nACT MODE\n'
}]
As you can see, processing a single task requires looping back and forth to call LLM multiple times until the task is finished. The other thing is that Cline basically just shoves everything into the LLM, one task at a time. Token The usage is very high.
Another problem is that it's easy to trigger LLM's context window limit, and Cline's strategy for dealing with this is to violently truncate it.
This is presumably how other AI Coding editors handle it as well. When I was using windsurf before, I was curious as to why it wasn't limited by the LLM context window. However, the previous answer is often repeated in later Q&As.