This article is part of the series "Understanding and Deploying Intelligent AI":
- Intelligent Body AI Series 1: Comparison between Devin and Agent Cursor
- Intelligent Body AI Series 2: From Thinker to Doer - Paradigm Revolution and Technical Architecture for Intelligent Body AI
- Smart Bodies AI Series 3: Turn $20 into $500 - in one hour! Cursor change into Devin (This article)
- Intelligent Body AI Series 4: Using Cursor as a Common Portal for AI
In a previous post, we discussed Devin, an intelligent body AI capable of fully automated programming. With Cursor and Windsurf Compared to other intelligent body AI tools such as Devin, it has some core strengths in process planning, self-evolution, extended tool usage, and fully automated operations. This makes Devin seem like a next-generation tool, differentiating it from existing intelligent body AI tools.
However, after using it for a while, my "builder's mentality" was rekindled and I was prompted to modify Windsurf and Cursor to implement Devin 90%'s features. I've also open sourced these modifications so that you can convert Cursor or Windsurf to Devin in just a minute, and this article focuses on the specifics of how these modifications were made, and this example demonstrates how efficient building and scaling can be in the age of intelligent AI. To simplify our discussion, we'll use Cursor to refer to such tools, and finally, we'll discuss what minor tweaks need to be made if you want to use Windsurf.
artifact | process planning | self-evolution | Tool Extension | automated implementation | prices |
---|---|---|---|---|---|
Devin | Yes (automatic, complete) | Yes (self-study) | enough | be in favor of | $500/month |
Cursor (before modification) | finite | clogged | Limited toolset | Manual Confirmation | $20/month |
Cursor (modified) | Approach Devin. | be | Close to Devin, scalable | Confirmation or workaround still needed | $20/month |
Windsurf (modified) | Approach Devin. | Yes, but indirectly | Close to Devin, scalable | Support for full automation in Docker containers | $15/month |
Process planning and self-evolution
As mentioned in the previous post, an interesting aspect of Devin is that it behaves more like an organized intern. It knows to first create a plan and then keep updating the progress of the plan as it executes. This makes it easier for us as AI managers to keep track of the AI's current progress while preventing it from deviating from the original plan, leading to deeper thinking and quality of task completion.
While this feature may seem impressive, it is actually very easy to implement using Cursor.
For Cursor, there is a file in the root of the opened folder named .cursorrules
This is a special file. What's special about it is that it allows you to modify the prompts that the Cursor sends to the back-end big language model. in other words, everything in this file becomes a prompt that is sent to the back-end AI (e.g., GPT or Claude) is part of the prompt. This gives us a lot of customization flexibility.
For example, we could put the contents of the plan in this file so that every time we interact with the Cursor, it receives the latest version of the plan. We could also give more detailed instructions in this file, such as having it think and plan at the beginning of the task, and updating the plan after each step. Since Cursor can modify files using agents, and the .cursorrules
It's a file in itself, which creates a closed loop. It automatically reads the contents of the file every time to find out what the latest updates are, and after thinking about it, writes the updated progress and next steps to this file, making sure we always get the latest updates.
A similar approach can be used to achieve a self-evolving function. In the .cursorrules
file, we add some prompts to make the Cursor reflect on its mistakes when corrected by the user, and consider whether there are any reusable experiences that need to be documented. If so, it will update the .cursorrules
relevant part of the document. In this way, it accumulates project-specific knowledge.
A typical example is that many of the current big language models are unaware of the GPT-4o model due to relatively early knowledge deadlines. If you ask them to invoke GPT-4o, they will delete the 'o', thinking it's a typo. But if you correct them, "This model actually exists, you just don't know about it," they will document the lesson learned in the .cursorrules
and not make the same mistakes again, thus learning and improving. However, this still depends on the prompt being effective - sometimes it may miss points and not always record knowledge that we think we should be aware of. In this case, we can also use natural language to prompt it, telling it directly to take note of the point. This more direct approach also allows the AI to gain experience and grow.
Therefore, by using only the .cursorrules
file and a few prompt tricks, we can add Devin's impressive process planning and self-evolution capabilities to existing AI programming tools for intelligentsia.
If Windsurf is used, there is one difference: probably for security reasons, it does not allow the AI to directly modify the .windsurfrules
file. Therefore, we need to split it into two parts, using another file such as the scratchpad.md
The In .windsurfrules
In the document, we mentioned that before each thought process, you should check Scratchpad and update the plan there. This indirect approach may not be as effective as directly placing it in the .cursorrules
in that it still requires the AI to call the agent and think based on the feedback, but it works in practice.
Extended Tool Usage
One of the main advantages of Devin over Cursor is its ability to use more tools. For example, it can call the browser to search, browse the web, and even use its own brain to analyze content using big language modeling intelligence. While Cursor doesn't support this by default, the good news is that since we can use the .cursorrules
The direct control of the Cursor's prompt and the fact that it has command execution capabilities creates another closed loop. We can prepare a pre-written program, such as a Python library or a command-line tool, and then use the .cursorrules
in which they are introduced so that it can learn instantly and naturally understand how to use these tools to accomplish its tasks.
In fact, the tools themselves can be written in a minute or two using Cursor. For example, for the web browsing functionality, I provide a reference implementation in the open source project. There are some technical decisions to be aware of, such as using a browser automation tool like playwright instead of Python's request library for JavaScript-heavy websites. Furthermore, in order to better communicate with the Big Language Model and make it easier for it to understand and crawl subsequent content, we don't simply use beautiful soup to extract the textual content of a web page. Instead, we converted it to markdown format according to certain rules, thus preserving more detailed basic information, such as class names and hyperlinks, to support the Big Language Model in writing subsequent crawlers at a more basic level.
Similarly, for search tools, there is a small caveat: both Bing and Google have API searches that are far inferior in quality to their client-side searches, largely due to the history of different teams dealing with APIs and web interfaces. However, DuckDuckGo does not have this problem, so our reference implementation uses DuckDuckGo's free API.
An in-depth analysis of Cursor's use of its own brainpower is relatively more complex. On the one hand, Cursor does have some degree of this capability - in both of these tools, when we print the content of a web page to stdout, it becomes part of the prompt that Cursor sends to the Big Language Model, allowing it to intelligently analyze that text content. From another perspective, however, Devin has a unique ability to batch process relatively large amounts of text using the Big Language Model in a way that Cursor cannot. So in order to give it this ability, we implemented an additional tool - very simple, just pre-set our API key in the system, and then have the tool call GPT or Claude or our local big-language modeling API to enable Cursor to batch process text using the big-language model. In my reference implementation, I'm using my own local vllm cluster, but it's very simple to modify - just remove the base_url line.
However, even with these modifications, we are still unable to implement two tools due to the limitations of the Cursor:
- Devin seems to have image understanding, which is why it can perform front-end interactions and tests, but due to Cursor limitations we can't pass images as input to the back-end AI - this would require changes to its implementation.
- Devin mysteriously doesn't get flagged as a bot by anti-crawler algorithms during the data collection process, but our web retrieval tool often encounters CAPTCHA or is blocked. This may be fixable and I'm still exploring it, but it's definitely one of Devin's unique strengths.
Fully automated execution
The last interesting feature is fully automated execution. Since Devin runs in a fully virtualized cloud environment, we can safely have it execute all sorts of commands without worrying about big language model attacks or running dangerous commands by mistake. Even if you delete an entire system, just start a new container and everything is back to normal. However, Cursor running on a localhost system is a serious security risk. That's why, in Cursor's agent mode, we need to manually confirm each command before executing it. This is acceptable for relatively simple tasks, but now that we have sophisticated process planning and self-evolving capabilities, Cursor can also handle long-term complex tasks, making this method of interaction seem ill-suited to Cursor's capabilities.
To address this, I haven't found a solution based on Cursor (update: on December 17, 2024, Cursor added this feature as well, called Yolo Mode, but it still doesn't support development in Docker), but Windsurf has already taken this into account, and I think from its design you can see that from the very beginning it was aims to create a Devin-like product form, with the current code editor being an intermediate form. More specifically, Windsurf has the ability to connect directly to a Docker container and run in it, or if we have a configuration file, it can help you start a new Docker container, do some initialization, and map local folders over. So all the commands it executes, except for changes to the local folder, are executed in the Docker container and have no impact on the host system, thus greatly improving security.
On top of that, it introduces a blacklist/whitelist mechanism that automatically rejects commands on the blacklist and allows commands on the whitelist. For commands that are neither on the list nor on the blacklist, the big language model intelligently determines if there is a risk to the host system - for example, if it wants to delete a file in a folder, it will ask the user for confirmation, but commands like pip install
Such generic commands will be allowed directly. Note that this feature seems to be enabled only when running in Docker containers. If we run the commands on the host system, the experience is still similar to Cursor and requires frequent confirmations. In addition, automatic command execution needs to be enabled in the configuration.
summarize
Thus, we can see that while Devin's product form and design concepts are indeed very advanced, the gap between it and existing AI tools for intelligentsia is not as large as we might think from a technical barriers perspective. Using popular tools such as Cursor and Windsurf, which cost $15-$20 per month, we can implement Devin 90%'s functionality in less than an hour and use it to accomplish complex tasks that were previously impossible. For example, I assigned Cursor the task of analyzing the returns of popular tech stocks over the past 5 years for an in-depth data analysis, and it provided a very detailed and comprehensive report. Additionally, I asked Windsurf to grab the publish times of the top 100 posts on my blog and visualize them in the form of a GitHub Contribution Chart, which it does completely automatically. These types of tasks can't be done with traditional Cursor and Windsurf - only Devin can do them, but with these simple modifications, we can achieve the results of a $500 per month tool with a $20 per month tool. I even did a more in-depth experiment: as a developer completely unfamiliar with front-end development, I spent an hour and a half creating a job board, both front-end and back-end. That kind of efficiency is very close to, if not better than, Devin.
Finally, all the files mentioned in this article can be downloaded from the Devin Cursor Rules Download - Simply copy the content into your current project folder and use it.