General Introduction
pure.md is a tool designed for AI agents and developers that focuses on quickly converting web content or files to Markdown format. It bypasses anti-crawler restrictions through proxy services, extracts the core data of a web page, and outputs a clean Markdown file. Whether it's a dynamic web page, PDF file or social media content, pure.md can handle it efficiently. The tool is powered by Cloudflare and AWS and provides a REST API interface. Its biggest feature is simple operation, can significantly reduce the time of content extraction and organization, especially suitable for the need for real-time data or structured output scenarios.
Function List
- Quick to Markdown: Extract the content of a web page or document into a clean Markdown format.
- Bypassing Anti-Crawler Detection: Simulates real user behavior and spins IP addresses to access restricted websites.
- JavaScript rendering: complete parsing of dynamic content for single page applications (SPAs).
- PDF and file conversion: Support PDF, Excel and other files to Markdown.
- Search Engine Crawling: crawls search results and integrates them into Markdown.
- Data Extraction: Extract JSON or digest via POST request, supports natural language commands.
- Social media support: extract data from LinkedIn, Twitter, etc. (some features under development).
Using Help
pure.md does not require a complicated installation and can be used directly through the web or API. Below are detailed step-by-step instructions and feature descriptions to make sure you get started quickly.
Basic usage
- Visit the official website
Type in your browserhttps://pure.md/
The program is available in a variety of formats, including No need to download the software, operate directly online. - Enter the target link
Precede the link withhttps://pure.md/
For example, it would behttps://example.com
change intohttps://pure.md/https://example.com
The - Get Markdown
After submitting, pure.md returns the extracted content, which is output in Markdown format by default. You can copy the results or download the file.
Featured Functions Operation Procedure
1. Quick conversion to Markdown
- procedure::
- Enter the target web page, e.g.
https://pure.md/https://wikipedia.org
The - When you click submit, pure.md removes the ads and extraneous elements and generates a Markdown file containing the title, body, and metadata.
- Enter the target web page, e.g.
- effect::
The output is only 28K characters, which is more concise than similar tools (e.g. r.jina.ai's 143K) and suitable for AI processing. Reference:Reader API: Web page content extraction tool, HTML to Markdown format conversion
2. Bypassing anti-crawler detection
- procedure::
- Enter a link to a restricted web page, such as
https://pure.md/https://science.org/article
The - pure.md uses data center agents, residential agents, or historical data (Common Crawl, Wayback Machine) to obtain content.
- If you need to log in, you can add a cookie to the request header (see the
https://pure.md/docs
).
- Enter a link to a restricted web page, such as
- effect::
Successfully extracts content and converts it to Markdown, bypassing restrictions such as "Verify you're human".
3. JavaScript Rendering Support
- procedure::
- Enter a link to a dynamic web page, such as
https://pure.md/https://react-app.com
The - pure.md performs DOM rendering in the background to generate the full content.
- The result is returned in Markdown.
- Enter a link to a dynamic web page, such as
- effect::
Dynamic data (such as comments or forms) for single-page applications is extracted in its entirety, avoiding just getting empty HTML.
4. PDF and document conversion
- procedure::
- Enter the PDF link, e.g.
https://pure.md/https://example.com/file.pdf
The - After submission, pure.md parses the file and converts it to Markdown.
- For Excel files, Markdown to table format is also supported.
- Enter the PDF link, e.g.
- effect::
Document content is organized into clear Markdown with hierarchical headings and paragraphs.
5. Search engine crawling
- procedure::
- Enter a link to the search term, such as
https://pure.md/https://google.com/search?q=AI
The - pure.md crawls search results and consolidates them into Markdown strings.
- Enter a link to the search term, such as
- effect::
The latest events or knowledge are quickly organized and suitable for updating AI data in real time.
6. Data extraction (POST requests)
- procedure::
- Use a POST request for access, e.g:
POST https://pure.md/https://reuters.com
Example request body:
{ "prompt": "列出今天的前5条头条", "model": "meta/llama-3.1-8b", "schema": {"type": "object", "properties": {"headlines": {"type": "array", "items": {"type": "string"}}}, "required": ["headlines"]} }
- Returns JSON or Markdown results.
- Use a POST request for access, e.g:
- effect::
Extracting structured data based on natural language instructions is suitable for complex tasks.
7. Social media support (under development)
- procedure::
- Enter a LinkedIn or Twitter link such as
https://pure.md/https://twitter.com/user/tweet
The - pure.md extracts content through a data provider.
- Enter a LinkedIn or Twitter link such as
- effect::
Output Markdown for posts or profiles, with support for more platforms in the future.
Pricing & Accounts
- enrollment: Access
https://pure.md/login
, get $1 credit for free. - set a price::
- Starter: 60 requests per minute, $0.001/extraction, $0.005/search.
- Growth: $19/month, 600 requests/minute with $20 free credit.
- Business: $99/month, 3000 requests/minute with $100 free credit.
- disbursement: Handled via Stripe, with support for canceling at any time.
caveat
- The free version has strict limitations and a subscription is recommended to unlock full functionality.
- Large pages or files take a little longer to process, usually 5-30 seconds.
- Social media features are not yet fully live, so stay tuned.
With these steps, you can easily extract content and convert it to Markdown using pure.md, which is simple and efficient.
application scenario
- AI Data Acquisition
AI developers need web data to train models. pure.md quickly extracts and converts to Markdown, reducing preprocessing. - Research and study
Students convert PDFs or web pages to Markdown for easy organization of notes or citation of information. - news monitoring
Enterprise crawls live news. pure.md crawls search results and outputs Markdown to keep information up to date.
QA
- Need a credit card to register?
No need, sign up and get $1 free credit. - What file types are supported?
Currently supports HTML, PDF, Excel, and images can be converted to descriptions via AI. - Can I access the logged in content?
Yes, but you need to provide a cookie, see the documentation.