# Evaluating DeepSeek-R1's Mathematical Capabilities Based on Ascend 800I-A2: 100% Paper Reproduction ### Version of AISBench Evaluation Tool Used for Reproduction The version of the AISBench evaluation tool used for reproduction in this paper is [v3.0-20250331](https://github.com/AISBench/benchmark/releases/tag/v3.0-20250331). ### I. Background and Objectives #### 1.1 Significance of Reproduction ##### 1.1.1 Mathematical Reasoning Advantages of DeepSeek-R1 As a large model specialized in mathematical reasoning tasks, DeepSeek-R1 demonstrates significant performance advantages in solving complex mathematical problems (such as theorem proving and multi-step derivation). Its official evaluations on the AIME2024 (American Invitational Mathematics Examination questions, 30 questions) and MATH-500 (500 international competition-level mathematics questions) datasets verify its leadership in long-sequence logical reasoning and symbolic computation accuracy. ##### 1.1.2 Hardware and Software Support Value of Ascend 800I-A2 Inference Card Based on the Da Vinci architecture, the Ascend 800I-A2 inference card provides high-performance INT8/FP16 mixed-precision computing power (with computing power reaching XX TOPS in typical scenarios) and ultra-low inference latency (microsecond-level response). MindIE (Mind Inference Engine, Ascend Inference Engine) supports key features such as the Transformer inference acceleration library (Ascend Transformer Boost) and PD-separated deployment, enabling efficient support for the intensive computing needs of large models like DeepSeek-R1. ![Image Description](https://foruda.gitee.com/images/1744444409604214296/cec0c242_14098946.png "Screenshot") #### 1.2 Reproduction Objectives Through the AISBench evaluation tool, fully reproduce the official evaluation scores of DeepSeek-R1 on the Ascend A2 inference card. The specific indicators are as follows: - AIME2024 Dataset: - Official Accuracy: Pass@1 79.8 - Ascend A2 Target: Error controlled within 1 question. - MATH-500 Dataset: - Official Accuracy: Pass@1 97.3% - Ascend A2 Target: Error controlled within 0.5%. ![Image Description](https://foruda.gitee.com/images/1744277500333819384/82167a6f_15694876.png "Screenshot") #### 1.3 Technical Value - Verify the compatibility between Ascend hardware and complex mathematical models: Through the reproduction process, demonstrate the technical maturity of the Ascend A2 inference card in scenarios such as symbolic computation and long-sequence parallel processing. - Provide a benchmark case for domestic deployment: Offer a reproducible technical path for domestic AI chips to support world-leading mathematical reasoning models, reducing reliance on the GPU ecosystem. ### II. Environment Preparation for Ascend 800I-A2 Inference Card #### 2.1 Overview of Environment Configuration | Component | Requirement | Remarks | |-------------------------|----------------------------|----------------------------------| | Inference Hardware Model| Atlas 800I A2 | Single card with 64GB VRAM | | MindIE Image Version | MindIE T6 B022 | | | Quantization Method | W8A8 | [MindStudio ModelSlim, Ascend Model Compression Tool](https://gitee.com/ascend/msit/tree/master/msmodelslim) | | Evaluation Tool | AISbench | [AISBench Benchmark Large Model Evaluation Tool](https://github.com/AISBench/benchmark/tree/master) | #### 2.2 Environment Configuration ##### 2.2.1 Download Weights Download BF16 model weights directly from open-source communities such as HuggingFace and ModelScope. | Platform | Link | |--------------|----------------------------------------------------------------------| | HuggingFace | https://huggingface.co/unsloth/DeepSeek-R1-bf16/ | | ModelScope | https://modelscope.cn/models/unsloth/deepseek-R1-bf16/ | ##### 2.2.2 Weight Quantization (BF16 to INT8) [Use ModelSlim to Generate Quantized Weights](https://gitee.com/ascend/msit/tree/br_noncom_MindStudio_8.0.0_POC_20251231/msmodelslim/example/DeepSeek) (Execute in the MindIE container; see Section 2.4 for container creation). ```bash git clone https://gitee.com/ascend/msit.git cd msit/msmodelslim/example/DeepSeek/ python3 quant_deepseek_w8a8.py --model_path {Floating-Point Weight Path} --save_path {W8A8 Quantized Weight Path} --bf16 ``` Modify the permission of the weight path to 750: ```bash chmod -R 750 {/path/to/weights} ``` ##### 2.2.3 Generate Rank Table 1) Check IP addresses of 8 cards: ```bash for i in {0..7};do hccn_tool -i $i -ip -g; done ``` 2) (Optional) If 8-card IP addresses are not configured, customize card IPs as follows (replace 10.20.3.13* with actual IPs): ```bash for i in {0..7}; do hccn_tool -i ${i} -ip -s address 10.20.3.13${i} netmask 255.255.255.0; done ``` 3) Then check if the corresponding cards can ping each other (taking two machines as an example): ```bash hccn_tool -i 0 -ping -g address 10.20.0.20 hccn_tool -i 1 -ping -g address 10.20.0.21 hccn_tool -i 2 -ping -g address 10.20.0.22 hccn_tool -i 3 -ping -g address 10.20.0.23 hccn_tool -i 4 -ping -g address 10.20.0.24 hccn_tool -i 5 -ping -g address 10.20.0.25 hccn_tool -i 6 -ping -g address 10.20.0.26 hccn_tool -i 7 -ping -g address 10.20.0.27 ``` Based on the IP information obtained in Step 1, refer to the following two-machine example. Users should add IPs and complete `device_ip` by themselves, where both `server_id` and `container_ip` are machine IPs: ```json { "server_count": "2", "server_list": [ { "device": [ { "device_id": "0", "device_ip": "...", "rank_id": "0" }, ... { "device_id": "7", "device_ip": "...", "rank_id": "7" } ], "server_id": "...", "container_ip": "..." }, { "device": [ { "device_id": "0", "device_ip": "...", "rank_id": "8" }, ... { "device_id": "7", "device_ip": "...", "rank_id": "15" } ], "server_id": "...", "container_ip": "..." } ], "status": "completed", "version": "1.0" } ``` After configuring `rank_table_file.json`, execute the following command to modify the permission to 640: ```bash chmod -R 640 {path/to/rank_table_file.json} ``` ##### 2.2.4 Image Configuration 1) Download the image [Image Download Instructions](https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f) Permission application is required before downloading the image. Wait patiently for the permission to be approved, then download the corresponding image file according to the guide. 2) Load the image: ```bash docker load < **800I-A2-py311-ubuntu22.04-aarch64.tar.gz ``` After completion, use the `docker images` command to confirm and find the specific image name and tag: ```bash docker images ``` 3) Create and start the container Write the following content into `start-docker.sh`: ```bash IMAGES_ID=$1 NAME=$2 if [ $# -ne 2 ]; then echo "error: need one argument describing your container name." exit 1 fi docker run --name ${NAME} -it -d --net=host --shm-size=500g \ --privileged=true \ -w /home \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --entrypoint=bash \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /usr/local/sbin:/usr/local/sbin \ -v /home:/home \ -v /tmp:/tmp \ -v /dl:/dl \ -v /mnt:/mnt \ -v /data:/data \ -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \ ${IMAGES_ID} ``` Execute the following command to create the container: ```bash bash start-docker.sh imagesID ContainerName ``` Execute the following command to enter the container: ```bash docker exec -it {Container Name} bash ``` ### 2.2.5 Starting the Inference Service Write the following content into set_env.sh: ``` export RANKTABLEFILE="/path/ranktable.json" # Modify the corresponding path export MIES_CONTAINER_IP=xx.xx.xxx.xxx # Modify to the local IP export MINDIE_LLM_LOG_TO_STDOUT=1 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3 export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1 export OMP_NUM_THREADS=1 export HCCL_DETERMINISTIC=false export HCCL_OP_EXPANSION_MODE="AIV" export NPU_MEMORY_FRACTION=0.96 export HCCL_CONNECT_TIMEOUT=7200 export HCCL_EXEC_TIMEOUT=0 source /usr/local/Ascend/mindie/set_env.sh source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh source /usr/local/Ascend/atb-models/set_env.sh export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0 export MINDIE_LOG_TO_STDOUT=1 export MASTER_IP=xx.xx.xxx.xxx # Modify to the master node IP for var in $(compgen -e | grep 'STDOUT$'); do export "$var=0" done for var in $(compgen -e | grep 'LOG_TO_FILE$'); do export "$var=0" done export INF_NAN_MODE_ENABLE=0 # Modify permissions find /usr/local/lib/python3.11/site-packages/mindie* -name config.json |xargs chmod -R 640 ``` Execute the following command to load the environment variables: ``` source /path/set_env.sh ``` Modify the service - oriented parameter configuration in the /usr/local/Ascend/mindie/latest/mindie - service/conf/config.json file. The example is as follows: ``` { "Version" : "1.0.0", "LogConfig" : { "logLevel" : "Info", "logFileSize" : 20, "logFileNum" : 20, "logPath" : "logs/mindie - server.log" }, "ServerConfig" : { "ipAddress" : "127.0.0.1", "managementIpAddress" : "127.0.0.2", "port" : 1025, "managementPort" : 1026, "metricsPort" : 1027, "allowAllZeroIpListening" : false, "maxLinkNum" : 1000, "httpsEnabled" : false, "fullTextEnabled" : false, "tlsCaPath" : "security/ca/", "tlsCaFile" : ["ca.pem"], "tlsCert" : "security/certs/server.pem", "tlsPk" : "security/keys/server.key.pem", "tlsPkPwd" : "security/pass/key_pwd.txt", "tlsCrlPath" : "security/certs/", "tlsCrlFiles" : ["server_crl.pem"], "managementTlsCaFile" : ["management_ca.pem"], "managementTlsCert" : "security/certs/management/server.pem", "managementTlsPk" : "security/keys/management/server.key.pem", "managementTlsPkPwd" : "security/pass/management/key_pwd.txt", "managementTlsCrlPath" : "security/management/certs/", "managementTlsCrlFiles" : ["server_crl.pem"], "kmcKsfMaster" : "tools/pmt/master/ksfa", "kmcKsfStandby" : "tools/pmt/standby/ksfb", "inferMode" : "standard", "interCommTLSEnabled" : false, "interCommPort" : 1121, "interCommTlsCaPath" : "security/grpc/ca/", "interCommTlsCaFiles" : ["ca.pem"], "interCommTlsCert" : "security/grpc/certs/server.pem", "interCommPk" : "security/grpc/keys/server.key.pem", "interCommPkPwd" : "security/grpc/pass/key_pwd.txt", "interCommTlsCrlPath" : "security/grpc/certs/", "interCommTlsCrlFiles" : ["server_crl.pem"], "openAiSupport" : "vllm", "tokenTimeout" : 3600, "e2eTimeout" : 3600 }, "BackendConfig" : { "backendName" : "mindieservice_llm_engine", "modelInstanceNumber" : 1, "npuDeviceIds" : [[0,1,2,3,4,5,6,7]], "tokenizerProcessNumber" : 8, "multiNodesInferEnabled" : true, "multiNodesInferPort" : 1120, "interNodeTLSEnabled" : false, "interNodeTlsCaPath" : "security/grpc/ca/", "interNodeTlsCaFiles" : ["ca.pem"], "interNodeTlsCert" : "security/grpc/certs/server.pem", "interNodeTlsPk" : "security/grpc/keys/server.key.pem", "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt", "interNodeTlsCrlPath" : "security/grpc/certs/", "interNodeTlsCrlFiles" : ["server_crl.pem"], "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa", "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb", "ModelDeployConfig" : { "maxSeqLen" : 32768, "maxInputTokenLen" : 1536, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "ds_r1", "modelWeightPath" : "/path/weight/", # Modify to the real weight path "worldSize" : 8, "cpuMemSize" : 5, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false, "tp":4, "dp":4, "moe_ep":4, "moe_tp":4 } ] }, "ScheduleConfig" : { "templateType" : "Standard", "templateName" : "Standard_LLM", "cacheBlockSize" : 128, "maxPrefillBatchSize" : 1, "maxPrefillTokens" : 4096, "prefillTimeMsPerReq" : 150, "prefillPolicyType" : 0, "decodeTimeMsPerReq" : 50, "decodePolicyType" : 0, "maxBatchSize" : 2048, "maxIterTimes" : 31232, "maxPreemptCount" : 0, "supportSelectBatch" : false, "maxQueueDelayMicroseconds" : 5000 } } } ``` At the same time, in the /usr/local/Ascend/mindie/latest/mindie - service/ path on the dual - machine, execute the following command to start the inference service: ``` ./bin/mindieservice_daemon ``` If the following command is printed, it is considered that the service has been successfully started: "Daemon start success!" ##### 2.2.6 Installation of AISbench Evaluation Tool Outside the image, install the aisbench evaluation tool: ``` conda create --name ais_bench python=3.10 -y conda activate ais_bench git clone https://github.com/AISBench/benchmark.git cd benchmark/ pip3 install -e ./ pip3 install -r requirements/api.txt ``` ### III. Dataset Evaluation Process #### 3.1 Preparing AIME 2024 and MATH - 500 Datasets 1) aime 2024 ``` # Ensure that you are at the outermost path of the source code your/work/dir/benchmark cd ais_bench/datasets mkdir aime cd aime wget http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/aime.zip unzip http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/aime.zip cd ../../../ # Return to the outermost path of the source code your/work/dir/benchmark ``` 2) math - 500 ``` # Ensure that you are at the outermost path of the source code your/work/dir/benchmark cd ais_bench/datasets wget http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/math.zip unzip http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/math.zip cd ../../ # Return to the outermost path of the source code your/work/dir/benchmark ``` #### 3.2 Modifying the Model Inference Back - end Configuration File The inference back - end configuration file contains the request configuration related to accessing MindIE - Service. Execute the following command to edit the corresponding configuration file (.py file): ``` # Ensure that you are at the outermost path of the source code your/work/dir/benchmark vim ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py ``` The content of the inference back - end configuration file is modified as follows: ``` from ais_bench.benchmark.models import VLLMCustomAPIChat models = [ dict( type=VLLMCustomAPIChat, abbr='vllm - api - general - chat', max_seq_len = 4096, query_per_second = 1, rpm_verbose = False, retry = 2, host_ip = "xxx.xxx.xxx.xx", # Inference service IP, configure according to the actual situation host_port = 1025, # Inference service port, configure according to the actual situation enable_ssl = False, max_out_len = 32768, # Maximum output tokens length is configured to 32K generation_kwargs = dict( # Post - processing parameters refer to Parameters in https://docs.vllm.ai/en/latest/api/inference_params.html#sampling - params. The configuration for this test is as follows temperature = 0, top_p = 0.95, seed = None, ) ) ] ``` #### 3.3 Modifying the AIME 2024 and MATH - 500 Dataset Configuration Files 1) aime 2024 The inference back - end configuration file contains information related to how the dataset is processed. Execute the following command to edit the corresponding configuration file (.py file): ``` # Ensure that you are at the outermost path of the source code your/work/dir/benchmark vim ais_bench/benchmark/configs/datasets/aime2024/aime2024_gen_0_shot_chat_prompt.py ``` The content of the dataset configuration file is modified as follows: ``` from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer from ais_bench.benchmark.datasets import Aime2024Dataset, MATHEvaluator, math_postprocess_v2 aime2024_reader_cfg = dict( input_columns=['question'], output_column='answer' ) aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( round=[ dict( role='HUMAN', prompt='{question}\nPlease reason step by step, and put your final answer within \\boxed{}.' ), ], ), ), retriever=dict(type=ZeroRetriever), inferencer=dict(type=GenInferencer, batch_size=4) # batch_size represents the number of parallel requests sent ) aime2024_eval_cfg = dict( evaluator=dict(type=MATHEvaluator, version='v2'), pred_postprocessor=dict(type=math_postprocess_v2) ) aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, path='ais_bench/datasets/aime/aime.jsonl', # Dataset path. When using a relative path, it is relative to the root path of the source code. Absolute paths are also supported reader_cfg=aime2024_reader_cfg, infer_cfg=aime2024_infer_cfg, eval_cfg=aime2024_eval_cfg ) ] ``` 2) math - 500 The inference back - end configuration file contains information related to how the dataset is processed. Execute the following command to edit the corresponding configuration file (.py file): ``` # Ensure that you are at the outermost path of the source code your/work/dir/benchmark vim ais_bench/benchmark/configs/datasets/math/math500_gen_0_shot_cot_chat_prompt.py ``` The content of the dataset configuration file is modified as follows: ``` from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer from ais_bench.benchmark.datasets import CustomDataset from ais_bench.benchmark.openicl.icl_evaluator import MATHEvaluator math_reader_cfg = dict(input_columns=['problem'], output_column='solution') math_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( round=[ dict( role='HUMAN', prompt='{problem}\nPlease reason step by step, and put your final answer within \\boxed{}.' ), ], ), ), retriever=dict(type=ZeroRetriever), inferencer=dict(type=GenInferencer, batch_size=16), # batch_size represents the number of parallel requests sent ) math_eval_cfg = dict(evaluator=dict(type=MATHEvaluator)) math_datasets = [ dict( type=CustomDataset, abbr='math_prm800k_500', path='ais_bench/datasets/math', # Dataset path. When using a relative path, it is relative to the root path of the source code. Absolute paths are also supported file_name='test_prm800k_500.jsonl', # Use math500 in math reader_cfg=math_reader_cfg, infer_cfg=math_infer_cfg, eval_cfg=math_eval_cfg, ) ] ``` #### 3.4 Starting the Evaluation Execute the following commands to start the evaluation: 1) aime 2024 ``` ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt --summarizer example ``` Detailed process logs and result files are saved in the execution command path "outputs/default/". You can execute the following command to view the inference progress: ``` tail -f outputs/default//logs/infer/vllm - api - general - chat/aime2024.out ``` 2) math - 500 ``` ais_bench --models vllm_api_general_chat --datasets math500_gen_0_shot_cot_chat_prompt --summarizer example ``` Detailed process logs and result files are saved in the execution command path "outputs/default/". You can execute the following command to view the inference progress: ``` tail -f outputs/default//logs/infer/vllm - api - general - chat/math_prm800k_500.out ``` Note that if the environment has a proxy configured, you need to execute the following command to make the proxy ineffective for the inference service IP (master node IP): ``` export no_proxy="xxx.xxx.xxx.xx" ``` #### 3.5 Viewing Evaluation Results The evaluation results will be printed directly on the screen. Meanwhile, the detailed structure of result files and process logs in the path "outputs/default/" is as follows, where you can view the detailed inference results and evaluation process: outputs/default/ ├── # One folder for each experiment │ ├── configs # Dumped configuration files for recording. Multiple configurations may be retained if different experiments are re-run in the same experiment folder │ ├── logs # Log files for the inference and evaluation phases │ │ ├── eval # Logs of the evaluation process │ │ └── infer # Logs of the inference process │ ├── predictions # Inference results for each task │ ├── results # Evaluation results for each task │ └── summary # Aggregated evaluation results of a single experiment ├── ... ### IV. Verification of Evaluation Results The evaluation results are shown in the figures below: ![Image Description](https://foruda.gitee.com/images/1744279619126000415/b440c50e_15694876.png "Screenshot") ![Image Description](https://foruda.gitee.com/images/1744279635756636101/b60a95f6_15694876.png "Screenshot") ### V. Reproduction Results Through the in-depth collaboration between the Ascend 800I-A2 inference card and the AISBench evaluation tool, this study has achieved the core goal in reproducing the DeepSeek-R1 mathematical reasoning model. The key achievements are as follows: - AIME2024 Dataset: Achieved Pass@1 80.0 points (official score: 79.8 points), with an error of only 0.2 points (<1 question); - MATH-500 Dataset: Achieved Pass@1 97.6% (official accuracy: 97.3%), with an error of 0.3% (<0.5% target). The accurate reproduction of the paper's indicators proves the technical maturity of the Ascend 800I-A2 inference card in scenarios such as symbolic computation and long-sequence parallel processing. | Dataset | Official Accuracy | Ascend A2 Accuracy | Error | Reproduction Result | |------------|-------------------|--------------------|-------|--------------------| | aime 2024 | 79.8 | 80.00 | 0.2 | pass | | math-500 | 97.3% | 97.6% | 0.3% | pass | Note: The metric "accuracy = 80.00" obtained by AISBench is equivalent to Pass@1 in a single run. The Pass@1 score of DeepSeek R1 in the paper is 79.8 (with temperature = 0.6, top_k = 0.95, and the average value of k runs, where k ranges from 4 to 64). The original paper content is shown in the figure below: ![Image Description](https://foruda.gitee.com/images/1744279685398671021/03056c3c_15694876.png "Screenshot")