AI Personal Learning
and practical guidance
讯飞绘镜

Deepseek R1 Enterprise Local Deployment Complete Manual

I. Introduction

Deepseek R1 is a high-performance general-purpose large language model that supports complex reasoning, multimodal processing, and technical document generation. This manual provides a complete local deployment guide for technical teams, covering hardware configurations, domestic chip adaptations, quantization solutions, heterogeneous solutions, cloud alternatives and deployment methods for the complete 671B MoE model.

II. Core configuration requirements for local deployment

1. Table of model parameters and hardware correspondence

Model parameters (B) Windows Configuration Requirements Mac Configuration Requirements Applicable Scenarios
1.5B - RAM: 4GB- GPU: Integrated Graphics/Modern CPU- Storage: 5GB - Memory: 8GB (M1/M2/M3) - Storage: 5GB Simple text generation, basic code completion
7B - RAM: 8-10GB- GPU: GTX 1680 (4-bit quantized)- Storage: 8GB - Memory: 16GB (M2 Pro/M3) - Storage: 8GB Medium Complexity Quiz, Code Debugging
8B - RAM: 16GB - GPU: RTX 4080 (16GB VRAM) - Storage: 10GB - Memory: 32GB (M3 Max) - Storage: 10GB Medium complexity reasoning, document generation
14B - RAM: 24GB- GPU: RTX 3090 (24GB VRAM) - Memory: 32GB (M3 Max) - Storage: 20GB Complex reasoning, technical documentation generation
32B Enterprise deployment (requires multiple cards in parallel) Not supported at this time Scientific computing, large-scale data processing
70B Enterprise deployment (requires multiple cards in parallel) Not supported at this time Large-scale reasoning, ultra-complex tasks
671B Enterprise deployment (requires multiple cards in parallel) Not supported at this time Ultra-large-scale research computing, high-performance computing

2. Analysis of computing power requirements

model version Parameter (B) calculation accuracy Model size VRAM Requirements (GB) Reference GPU Configuration
DeepSeek-R1 671B FP8 ~1,342GB ≥1,342GB Multi-GPU configurations (e.g. NVIDIA A100 80GB * 16)
DeepSeek-R1-Distill-Llama-70B 70B BF16 43GB ~32.7GB Multi-GPU configurations (e.g. NVIDIA A100 80GB * 2)
DeepSeek-R1-Distill-Qwen-32B 32B BF16 20GB ~14.9GB Multi-GPU configurations (e.g. NVIDIA RTX 4090 * 4)
DeepSeek-R1-Distill-Qwen-14B 14B BF16 9GB ~6.5GB NVIDIA RTX 3080 10GB or higher
DeepSeek-R1-Distill-Llama-8B 8B BF16 4.9GB ~3.7GB NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-7B 7B BF16 4.7GB ~3.3GB NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-1.5B 1.5B BF16 1.1GB ~0.7GB NVIDIA RTX 3060 12GB or higher

Deepseek R1 企业本地部署完全手册-1

Additional Notes:

  1. VRAM RequirementsThe VRAM requirements listed in the table are minimum requirements, and it is recommended that 20%-30% of additional video memory be reserved for actual deployments to handle peak demand during model loading and operation.
  2. Multi-GPU Configuration: For large-scale models (e.g. 32B+), it is recommended to use multiple GPUs in parallel to improve computational efficiency and stability.
  3. calculation accuracy: FP8 and BF16 are the current mainstream high-efficiency computational accuracies, which can guarantee the model performance while reducing the graphics memory usage.
  4. Applicable Scenarios: Models with different parameter scales are suitable for tasks of different complexity, and users can choose the appropriate model version according to their actual needs.
  5. Enterprise Deployment: For very large-scale models such as 671B, it is recommended that a professional-grade GPU cluster (such as the NVIDIA A100) be deployed to meet high-performance computing requirements.

III. Domestic Chip and Hardware Adaptation Program

1. Domestic eco-partnership dynamics

corporations Adaptation content Performance Benchmarking (vs NVIDIA)
Huawei Rise The Rise 910B natively supports the full R1 family and provides end-to-end inference optimization.
Mu Xi GPU MXN series support 70B model BF16 inference, video memory utilization increased by 30% RTX 3090 equivalent
Sea Light DCU Adapts to V3/R1 models, performance against NVIDIA A100 Equivalent A100 (BF16)

2. Recommended configuration for national hardware

model parameter Recommended Programs Applicable Scenarios
1.5B Taichu T100 Accelerator Card Individual developer prototype validation
14B Kunlun Core K200 Cluster Enterprise-level complex task reasoning
32B Wallchurch Computing Power Platform + Rise 910B Cluster Scientific Computing and Multimodal Processing

IV. Cloud deployment alternatives

1. Recommended domestic cloud service providers

flat-roofed building Core Advantages Applicable Scenarios
Silicon-based flow Officially recommended API, low latency, multimodal model support Enterprise-class high-concurrency reasoning
Tencent cloud One-click deployment + free trial for a limited time with VPC privatization support Small and Medium Scale Models Go Live Quickly
PPIO Paio Cloud 1/20th of the price of OpenAI, 50 million free with registration tokens Low-cost tasting and testing

2. International access channels (requires magic or foreign enterprise Internet access)

  • NVIDIA NIM: Enterprise GPU Cluster Deployment (link)
  • Groq: ultra-low latency reasoning (link)

V. Ollama+Unsloth deployment

1. Quantification scheme and model selection

quantized version file size Minimum RAM + VRM Requirements Applicable Scenarios
DeepSeek-R1-UD-IQ1_M 158GB ≥200GB Consumer-grade hardware (e.g., Mac Studio)
DeepSeek-R1-Q4_K_M 404 GB ≥500GB High Performance Servers/Cloud GPUs

Download Address:

  • HuggingFace Model Library
  • Unsloth AI Official Description

2. Hardware configuration recommendations

Hardware type Recommended Configurations Performance performance (short text generation)
Consumer-grade devices Mac Studio (192GB unified memory) 10+ tokens/second
High-performance servers 4 RTX 4090 (96GB video memory + 384GB RAM) 7-8 tokens/second (mixed reasoning)

3. Deployment steps (Linux example)

1. Installation of dependent tools:

# 安装 llama.cpp(用于合并分片文件)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install llama.cpp

2. Download and merge model slices:

llama-gguf-split --merge DeepSeek-R1-UD-IQ1_M-00001-of-00004.gguf DeepSeek-R1-UD-IQ1_S.gguf

3. Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

4. Create the Modelfile:

FROM /path/to/DeepSeek-R1-UD-IQ1_M.gguf
PARAMETER num_gpu 28  # 每块 RTX 4090 加载 7 层(共 4 卡)
PARAMETER num_ctx 2048
PARAMETER temperature 0.6
TEMPLATE "<|end▁of▁thinking $|>{{{ .Prompt }}}<|end▁of▁thinking|>"

5. Run the model:

ollama create DeepSeek-R1-UD-IQ1_M -f DeepSeekQ1_Modelfile

4. Performance tuning and testing

  • Low GPU utilization: Upgrade high-bandwidth memory (e.g. DDR5 5600+).
  • Extended Swap Space::
sudo fallocate -l 100G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Full blood 671B Deployment order

  • VLLM::
vllm serve deepseek-ai/deepseek-r1-671b --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
  • SGLang::
python3 -m sglang.launch_server --model deepseek-ai/deepseek-r1-671b --trust-remote-code --tp 2

VI. Notes and Risks

1. Cost alerts:

  • 70B Model: Requires 3 or more 80G RAM graphics cards (e.g. RTX A6000), not feasible for single card users.
  • 671B Model: Requires 8xH100 clusters for supercomputing center deployments only.

2. Alternative programs:

  • Individual users are recommended to use cloud-based APIs (e.g., Silicon Mobility), which are maintenance-free and compliant.

3. National hardware compatibility:

  • A customized version of the framework is required (e.g., Rise CANN, Mu Xi MXMLLM).

VII. Appendix: Technical support and resources

  • Huawei Rise: Rise Cloud Services
  • Mu Xi GPU: Free API Experience
  • Lee Seok Han Blog: Full deployment tutorial

VIII. Heterogeneous GPUStack solutions

GPUStack Open Source Project

https://github.com/gpustack/gpustack/

Model Resource Measurement Tool

  • GGUF Parser(https://github.com/gpustack/gguf-parser-go) is used to manually calculate the video memory requirements.

GPUStack

DeepSeek Full Platform Private Deployment

Model Context Size VRAM Requirement Recommended GPUs
R1-Distill-Qwen-1.5B (Q4_K_M) 32K 2.86 GiB RTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-1.5B (Q8_0) 32K 3.47 GiB RTX 4060 8GB, MacBook Pro M4 Max 36G
r1-distill-qwen-1.5b (fp16) 32K 4.82 GiB RTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q4_K_M) 32K 7.90 GiB RTX 4070 12GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q8_0) 32K 10.83 GiB RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (FP16) 32K 17.01 GiB RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q4_K_M) 32K 10.64 GiB RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q8_0) 32K 13.77 GiB RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (FP16) 32K 20.32 GiB RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q4_K_M) 32K 16.80 GiB RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q8_0) 32K 22.69 GiB RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (FP16) 32K 34.91 GiB RTX 4090 24GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q4_K_M) 32K 28.92 GiB RTX 4080 16GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q8_0) 32K 42.50 GiB RTX 4090 24GB x3, MacBook Pro M4 Max 64G
R1-Distill-Qwen-32B (FP16) 32K 70.43 GiB RTX 4090 24GB x4, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q4_K_M) 32K 53.41 GiB RTX 4090 24GB x5, A100 80GB x1, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q8_0) 32K 83.15 GiB RTX 4090 24GB x5, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (FP16) 32K 143.83 GiB A100 80GB x2, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_S) 32K 225.27 GiB A100 80GB x4, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_M) 32K 251.99 GiB A100 80GB x4, Mac Studio M2 Ultra 192G x2
R1-671B (UD-IQ2_XXS) 32K 277.36 GiB A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (UD-Q2_K_XL) 32K 305.71 GiB A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K_XS) 32K 300.73 GiB A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K/Q2_K_L) 32K 322.14 GiB A100 80GB x6, Mac Studio M2 Ultra 192G x2
R1-671B (Q3_K_M) 32K 392.06 GiB A100 80GB x7
R1-671B (Q4_K_M) 32K 471.33 GiB A100 80GB x8
R1-671B (Q5_K_M) 32K 537.31 GiB A100 80GB x9
R1-671B (Q6_K) 32K 607.42 GiB A100 80GB x11
R1-671B (Q8_0) 32K 758.54 GiB A100 80GB x13
R1-671B (FP8) 32K 805.2 GiB H200 141GB x8

concluding remarks

Deepseek R1 Localized deployment requires extremely high hardware investment and technical thresholds, so individual users should be cautious and enterprise users should fully assess the needs and costs. Through localized adaptation and cloud services, you can significantly reduce the risk and improve efficiency. Technology has no limits, rational planning can reduce costs and increase efficiency!

Global Enterprise Personal Channel Schedule

  1. Secret Tower Search
  2. 360 Nano AI Search
  3. Silicon-based flow
  4. Byte Jump Volcano Engine
  5. Baidu cloud Chifan, a virtual virtualization system created by Baidu.com
  6. NVIDIA NIM
  7. Groq
  8. Fireworks
  9. Chutes
  10. Github
  11. POE
  12. Cursor
  13. Monica
  14. lambda (Greek letter Λλ)
  15. Cerebras
  16. Perplexity
  17. Alibaba Cloud 100 Refinement

For environments that require magic or foreign corporate Internet access

Chip Business Support Schedule

Table 1: Cloud Vendors Supporting DeepSeek-R1

dates Name/website Publishing relevant information
January 28 lit. not knowing the core dome of the sky A great combination of heterogeneous clouds
January 28 PPIO Paio Cloud DeepSeek-R1 goes live on PPIO Paio Computing Cloud!
February 1 Silicon-based mobility x Huawei First Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
February 2 Z stark (Cloud Axis Technology) ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPUs for private deployment!
February 3 Baidu Intelligent Cloud Chifan Baidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
February 3 supercomputing Internet Supercomputing Internet Goes Live with DeepSeek Series of Models for Superintelligent Fusion Arithmetic Support
February 4 Huawei (Rise Community) DeepSeek series of new models are officially launched on Rise Community.
February 4 Lu Chen x Huawei Rise LU Chen x Huawei Rise, together launching DeepSeekR1 series inference API and cloud mirroring service based on domestic arithmetic power
February 4 GreenCloud Technologies, Inc. Free for a limited time, one-click deployment! Keystone Smart Computing Officially Goes Live with DeepSeek-R1 Series Models
February 4 Tennessee Intelligent Core (TIC), computing technology One Day Adaptation! DeepseekR1 Modeling Service with GiteeAi
February 4 molecular biology Tribute to Deepseek: Starting a Fire for China's Al Ecosystem with Domestic GPUs
February 4 Hai Guang Information DeepSeekV3 and R1, Training Completes SeaLight DCU Adaptation and Goes Live
February 5 first light of shower DeepSeek-V3 full-blooded version goes live in domestic MuXi GPU premiere experience
February 5 Hai Guang Information Haidu Ang DcCU Chen Gong adapts DeepSeek-Janus-pro multimodal macromodels
February 5 Jingdong Yun (Beijing 2008-), China's largest cloud provider One Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
February 5 (measure) DeepSeekR1 in the wall ren domestic Ai arithmetic platform released, the full range of models one-stop empower developers
February 5 Unicom Cloud (China Unicom) "Nezha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
February 5 Mobile Cloud (China Mobile) Full version, full size, full functionality! Mobile Cloud goes fully live with DeepSeek
February 5 Ucotex (brand) UXTECH adapts DeepSeek's full range of models based on domestic chips.
February 5 Acer, a Taiwanese-American writer Based on Taichu T100 acceleration card 2 hours to adapt DeepSeek-R1 series models, one-click experience, free API service
February 5 Reed Yun Tian Fei (1902-1985), Taiwanese master puppeteer DeepEdge10 has completed DeepSeek-R1 series model adaptation
February 6 SkyCloud (China Telecom) New breakthrough in domestic Al ecology! "Hibiscus" + DeepSeek, the king bomb!
February 6 Suwon Technology Original Technology Realizes Deployment of Full Volume Reasoning Service for DeepSeek in Smart Computing Centers Across the Country
February 6 Kunlun Core Domestic Alka Deepseek training inference full version adapted, excellent performance, one-key deployment and so on you!
February 7 Wave Cloud Wave Cloud First to Release 671BDeepSeek Big Model All-in-One Solution
February 7 Beijing Supercomputer Beijing Supercomputing xDeepSeek:Dual engines burst into flames, driving a storm of hundreds of billions of Al innovations
February 8 China E-Cloud China eCloud Goes Live with DeepSeek-R1/V3 Full Volume Model Opens New Chapter of Private Deployment
February 8 Kingsoft Cloud Kingsoft Cloud Supports DeepSeek-R1/V3
February 8 Shang Tang's big device Shangtang's big device shelves DeepSeek series of models with limited experience and upgraded services!

Table 2: Enterprises Supporting DeepSeek-R1

dates Name/website Publishing relevant information
January 30 360 Nano AI Search Nano AI search online "DeepSeek-R1" big model full-blooded version
February 3 Secret Tower AI Search Secret Tower AI accesses full-blooded version of DeepSeekR1 inference model
February 5 Xiaoyi Assistant (Huawei) Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeekR1/V3 inference service based on the Rise Cloud service
February 5 Writer's Assistant (Reading Group) The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Auxiliary Creative Functions
February 5 Wanxing Technology Co., Ltd. Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
February 6 Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010 Embracing DeepSeek as the representative reasoning big model, NetEase has accelerated the landing of AI education
February 6 cloud school (computing) Cloud Learning access to DeepSeek product AI capabilities comprehensively upgraded
February 7 staple Nail AI assistant access DeepSeek, support for deep thinking
February 7 What's Worth Buying Worth Buying: Access to DeepSeek Modeling Products
February 7 flush (finance) Flush ask money 2.0 upgrade: inject "slow thinking" wisdom, to create a more rational investment decision-making assistant
February 8 Tiangong AI(Kunlun Wanwei) Kunlun Wanwei's Tiangong AI Officially Launches DeepSeekR1+ Connected Search
February 8 Phantom of the Stars FlymeAIOS has completed DeepSeek-R1 big model access!
February 8 glorify Glory has access to DeepSeek

Table 3: Summary of enterprises supporting DeepSeek-R1

Name/website Publishing relevant information
DeepSeek DeepSeek-R1 released, performance benchmarked against OpenAI o1 official version
lit. not knowing the core dome of the sky Infini-Al Heterogeneous Cloud Now Available DeepSeek-R1-Distill, a Great Combination of Domestic Models and Heterogeneous Clouds
PPIO Paio Cloud DeepSeek-R1 goes live on PPIO Paio Computing Cloud!
Silicon-based flow Huawei First Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
Z stark (Cloud Axis Technology) ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPU for private deployment.
Baidu Intelligent Cloud Chifan Baidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
supercomputing Internet Supercomputing Internet Goes Live with DeepSeek Series of Models, Providing Superintelligent Fusion Arithmetic Support
Huawei (Rise Community) DeepSeek series of new models are officially launched on Rise community!
Lu Chen x Huawei Rise LU Chen x Huawei Rise, Launching DeepSeekR1 Series of Inference APIs and Cloud Distribution Services Based on Domestic Arithmetic Power
GreenCloud Technologies, Inc. Free for a limited time, one-click deployment! Cornerstone Computing Launches DeepSeek-R1 Series of Models
Jingdong Yun (Beijing 2008-), China's largest cloud provider One Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
Unicom Cloud (China Unicom) "Ne Zha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
Mobile Cloud (China Mobile) Full version, full size, full functionality! Mobile Cloud goes fully live DeepSeek
Ucotex (brand) UQD adapts the full range of DeepSeek models based on a domestic chip
SkyCloud (China Telecom) New breakthrough in domestic AI ecosystem! "Hibernate" + DeepSeek, the king bomb!
Digital China 3-minute deployment of high-performance AI model DeepSeek, Digital China to help enterprises intelligent transformation
Kaplan Cape Cloud Enlightened Large Model Application and End-Side All-in-One Fully Accessible to DeepSeek
Gold Butterfly Cloud Dome Kingdee's full access to DeepSeek big model helps enterprises accelerate AI application!
parallel technology Server busy? Parallel Technologies helps you DeepSeek Freedom!
Capital Online (CAPITAL) Capital Online Cloud Platform Goes Live with DeepSeek-R1 Family of Models
Wave Cloud Wave Cloud First to Release 671B DeepSeek Large Model All-in-One Solution
Beijing Supercomputer Beijing Supercomputing x DeepSeek: Twin Engines Explode, Driving a Storm of Hundreds of Billions of AI Innovations
Rhinoceros Enablement (Ziguang) ZiGuang: Rhinoceros Enablement Platform Realizes Nascent Pipe and Upper Shelf for DeepSeekV3/R1 Models
China E-Cloud China eCloud Goes Live with DeepSeek-R1/V3 Full Volume Models to Open New Chapter of Private Deployment
Kingsoft Cloud Kingsoft Cloud Support DeepSeek-R1/V3
Shang Tang's big device Shangtang's Big Installation shelves DeepSeek series of models with limited experience and upgraded services!
360 Nano AI Search Nano AI Search Goes Live with "DeepSeek-R1" Large Model Full Blooded Version
Secret Tower AI Search minaret AI access to full-blooded DeepSeek R1 inference models
Xiaoyi Assistant (Huawei) Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeek R1/V3 inference service based on Rise Cloud service.
Writer's Assistant (Reading Group) The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Creative Assistance Functions
Wanxing Technology Co., Ltd. Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010 Embracing DeepSeek's big model of reasoning, NetEaseYouDao accelerates the landing of AI education
cloud school (computing) Cloud Learning Access to DeepSeek Product AI Capabilities Comprehensively Upgraded
staple Nail AI assistant access DeepSeek, support for deep thinking
What's Worth Buying Worth Buying: Access to DeepSeek Modeling Products
Summary of AI capabilities related to Flybook x DeepSeek (public version)
flush (finance) Flush Q&C 2.0 Upgrades: Injecting "Slow Thinking" Wisdom to Create a More Rational Investment Decision Assistant
heavenly workmanship AI (Kunlun Wanwei) Tiangong AI, a subsidiary of Kunlun Wanwei, officially launches DeepSeek R1 + Networked Search
Phantom of the Stars Flyme AI OS has completed DeepSeek-R1 big model access!
glorify Glory has access to DeepSeek
May not be reproduced without permission:Chief AI Sharing Circle " Deepseek R1 Enterprise Local Deployment Complete Manual
en_USEnglish