DeepSeek Database Leak: The Security Hazards Behind China's AI Rise

37.7K 00

In recent years, China has made world-renowned achievements in the field of artificial intelligence, emerging a number of companies like DeepSeek The DeepSeek database leakage incident has once again sounded the alarm. However, while pursuing technological breakthroughs, security issues should not be ignored, and the DeepSeek database leak has once again sounded the alarm, reminding us that we must strike a balance between technological development and security in order to avoid repeating the same mistakes.

Leakage does not mean that the user's data is being used maliciously, this test only reveals security issues, this vulnerability has been closed in time after the discovery, do not fall into panic.PS: In fact, every piece of your data is transparent. And observing this vulnerability, it is reasonable to guess what purpose it is used for, why care about privacy in this life?

Wiz Research Exposes DeepSeek Database Exposure, Leaks Sensitive Information Including Chat Logs

A publicly accessible database belonging to DeepSeek allowed complete control over database operations, including the ability to access internal data. The exposure included over a million lines of log streams containing highly sensitive information.

Wiz Research discovered a publicly accessible ClickHouse database belonging to DeepSeek that allowed complete control over database operations, including the ability to access internal data. The exposure consisted of over a million lines of log streams containing chat logs, keys, backend details, and other highly sensitive information.The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which quickly took steps to protect the exposed data.

In this blog post, we will detail our findings and consider their wider implications for the industry as a whole.

summaries

DeepSeek is a Chinese AI startup recognized for its groundbreaking AI models, particularly the DeepSeek-R1 inference model that has recently received a lot of media attention. The model rivals leading AI systems such as OpenAI's o1 in performance and stands out for its cost-effectiveness and efficiency.

With DeepSeek making waves in the AI space, the Wiz Research team set out to assess its external security posture and identify any potential vulnerabilities.

Within minutes, we discovered a publicly accessible ClickHouse database associated with DeepSeek that was completely open and unauthenticated, exposing sensitive data. It is hosted at oauth2callback.deepseek.com:9000 and dev.deepseek.com:9000.

The database contains a large amount of chat logs, back-end data, and sensitive information, including log streams, API keys, and operation details.

More importantly, this exposure allows for full control of the database and potentially elevation of privilege within the DeepSeek environment without the need for any authentication or defense mechanisms against the outside world.

exposure process

Our reconnaissance efforts began by evaluating DeepSeek's publicly accessible domains. By mapping the external attack surface using direct reconnaissance techniques (passive and active discovery of subdomains), we identified approximately 30 Internet-facing subdomains. Most of the subdomains appeared benign, hosting elements such as chatbot interfaces, status pages, and API documentation - none of which initially indicated a high-risk exposure.

However, when we expanded our search beyond the standard HTTP ports (80/443), we detected two hosts associated with the followingUnusual open ports (8123 and 9000)::

http://oauth2callback.deepseek.com:8123
http://dev.deepseek.com:8123
http://oauth2callback.deepseek.com:9000
http://dev.deepseek.com:9000

Upon further investigation, the ports pointed to aPublicly exposed ClickHouse databaseThe database was accessible without any authentication at all - an immediate cause for alarm.

ClickHouse is an open source columnar database management system designed for fast analytical queries on large data sets. It was developed by Yandex and is widely used for real-time data processing, log storage, and big data analytics, which suggests that this type of exposure is a very valuable and sensitive discovery.

By utilizing ClickHouse's HTTP interface, we accessed the /play path, which isAllows execution of arbitrary SQL queries directly from the browser. Run a simple SHOW TABLES; the query returns a complete list of accessible datasets.

ClickHouse Web UI Output Forms

One table that stands out is log_stream, which contains tables with theHighly sensitive dataof a large number of logs.

The log_stream table containsOver 1 million log entriesThe first is that it contains columns that are particularly revealing:

timestamp - the date of the log from January 6, 2025commencement
span_name - references various internal DeepSeek API Endpoint
string.values - Plain text logIncludeschat log,API keys, backend details and operational metadata
_service - indicates which DeepSeek ServicesLogs are generated
_source - exposureSource of the log requestContainsChat logs, API keys, directory structure and chatbot metadata logs

This level of access poses a serious risk to DeepSeek's own security and that of its end users. Not only can an attacker retrieve sensitive logs and actual plain text chat messages, but they can also use queries such as SELECT * FROM file('filename') to extract plain text passwords and local files as well as proprietary information directly from the server, depending on their ClickHouse configuration.

(Note: we did not perform intrusive queries beyond the scope of the enumeration to maintain ethical research practices.)

Key takeaways

The rapid adoption of AI services without appropriate security measures is inherently risky. This exposure highlights the fact that the direct security risks of AI applications stem from the infrastructure and tools that support them.

While much of the attention around AI security has focused on future threats, the real danger often comes from fundamental risks - such as accidental external database exposure. These risks are the foundation of security and should remain a top priority for security teams.

As organizations race to adopt AI tools and services from a growing number of startups and providers, it's important to remember that by doing so, we're entrusting sensitive data to these companies. The rapid pace of adoption often leads to a neglect of security, but protecting customer data must remain a top priority. Security teams must work closely with AI engineers to ensure visibility into the architecture, tools and models used so that we can protect data and prevent exposure.

reach a verdict

The world has never seen a technology adopted at such a pace as AI. Many AI companies have rapidly evolved into critical infrastructure providers without the security frameworks that typically accompany such widespread adoption. As AI becomes more deeply integrated into enterprises globally, the industry must recognize the risks of handling sensitive data and enforce security practices comparable to those required by public cloud providers and major infrastructure providers.