Evaluating DeepSeek-R1βs Mathematical Capabilities Based on Ascend 800I-A2: 100% Paper Reproductionο
Version of AISBench Evaluation Tool Used for Reproductionο
The version of the AISBench evaluation tool used for reproduction in this paper is v3.0-20250331.
I. Background and Objectivesο
1.1 Significance of Reproductionο
1.1.1 Mathematical Reasoning Advantages of DeepSeek-R1ο
As a large model specialized in mathematical reasoning tasks, DeepSeek-R1 demonstrates significant performance advantages in solving complex mathematical problems (such as theorem proving and multi-step derivation). Its official evaluations on the AIME2024 (American Invitational Mathematics Examination questions, 30 questions) and MATH-500 (500 international competition-level mathematics questions) datasets verify its leadership in long-sequence logical reasoning and symbolic computation accuracy.
1.1.2 Hardware and Software Support Value of Ascend 800I-A2 Inference Cardο
Based on the Da Vinci architecture, the Ascend 800I-A2 inference card provides high-performance INT8/FP16 mixed-precision computing power (with computing power reaching XX TOPS in typical scenarios) and ultra-low inference latency (microsecond-level response). MindIE (Mind Inference Engine, Ascend Inference Engine) supports key features such as the Transformer inference acceleration library (Ascend Transformer Boost) and PD-separated deployment, enabling efficient support for the intensive computing needs of large models like DeepSeek-R1.

1.2 Reproduction Objectivesο
Through the AISBench evaluation tool, fully reproduce the official evaluation scores of DeepSeek-R1 on the Ascend A2 inference card. The specific indicators are as follows:
AIME2024 Dataset:
Official Accuracy: Pass@1 79.8
Ascend A2 Target: Error controlled within 1 question.
MATH-500 Dataset:
Official Accuracy: Pass@1 97.3%
Ascend A2 Target: Error controlled within 0.5%.

1.3 Technical Valueο
Verify the compatibility between Ascend hardware and complex mathematical models: Through the reproduction process, demonstrate the technical maturity of the Ascend A2 inference card in scenarios such as symbolic computation and long-sequence parallel processing.
Provide a benchmark case for domestic deployment: Offer a reproducible technical path for domestic AI chips to support world-leading mathematical reasoning models, reducing reliance on the GPU ecosystem.
II. Environment Preparation for Ascend 800I-A2 Inference Cardο
2.1 Overview of Environment Configurationο
Component |
Requirement |
Remarks |
|---|---|---|
Inference Hardware Model |
Atlas 800I A2 |
Single card with 64GB VRAM |
MindIE Image Version |
MindIE T6 B022 |
|
Quantization Method |
W8A8 |
|
Evaluation Tool |
AISbench |
2.2 Environment Configurationο
2.2.1 Download Weightsο
Download BF16 model weights directly from open-source communities such as HuggingFace and ModelScope.
Platform |
Link |
|---|---|
HuggingFace |
|
ModelScope |
2.2.2 Weight Quantization (BF16 to INT8)ο
Use ModelSlim to Generate Quantized Weights (Execute in the MindIE container; see Section 2.4 for container creation).
git clone https://gitee.com/ascend/msit.git
cd msit/msmodelslim/example/DeepSeek/
python3 quant_deepseek_w8a8.py --model_path {Floating-Point Weight Path} --save_path {W8A8 Quantized Weight Path} --bf16
Modify the permission of the weight path to 750:
chmod -R 750 {/path/to/weights}
2.2.3 Generate Rank Tableο
Check IP addresses of 8 cards:
for i in {0..7};do hccn_tool -i $i -ip -g; done
(Optional) If 8-card IP addresses are not configured, customize card IPs as follows (replace 10.20.3.13* with actual IPs):
for i in {0..7}; do hccn_tool -i ${i} -ip -s address 10.20.3.13${i} netmask 255.255.255.0; done
Then check if the corresponding cards can ping each other (taking two machines as an example):
hccn_tool -i 0 -ping -g address 10.20.0.20
hccn_tool -i 1 -ping -g address 10.20.0.21
hccn_tool -i 2 -ping -g address 10.20.0.22
hccn_tool -i 3 -ping -g address 10.20.0.23
hccn_tool -i 4 -ping -g address 10.20.0.24
hccn_tool -i 5 -ping -g address 10.20.0.25
hccn_tool -i 6 -ping -g address 10.20.0.26
hccn_tool -i 7 -ping -g address 10.20.0.27
Based on the IP information obtained in Step 1, refer to the following two-machine example. Users should add IPs and complete device_ip by themselves, where both server_id and container_ip are machine IPs:
{
"server_count": "2",
"server_list": [
{
"device": [
{
"device_id": "0",
"device_ip": "...",
"rank_id": "0"
},
...
{
"device_id": "7",
"device_ip": "...",
"rank_id": "7"
}
],
"server_id": "...",
"container_ip": "..."
},
{
"device": [
{
"device_id": "0",
"device_ip": "...",
"rank_id": "8"
},
...
{
"device_id": "7",
"device_ip": "...",
"rank_id": "15"
}
],
"server_id": "...",
"container_ip": "..."
}
],
"status": "completed",
"version": "1.0"
}
After configuring rank_table_file.json, execute the following command to modify the permission to 640:
chmod -R 640 {path/to/rank_table_file.json}
2.2.4 Image Configurationο
Download the image Image Download Instructions Permission application is required before downloading the image. Wait patiently for the permission to be approved, then download the corresponding image file according to the guide.
Load the image:
docker load < **800I-A2-py311-ubuntu22.04-aarch64.tar.gz
After completion, use the docker images command to confirm and find the specific image name and tag:
docker images
Create and start the container Write the following content into
start-docker.sh:
IMAGES_ID=$1
NAME=$2
if [ $# -ne 2 ]; then
echo "error: need one argument describing your container name."
exit 1
fi
docker run --name ${NAME} -it -d --net=host --shm-size=500g \
--privileged=true \
-w /home \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--entrypoint=bash \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /usr/local/sbin:/usr/local/sbin \
-v /home:/home \
-v /tmp:/tmp \
-v /dl:/dl \
-v /mnt:/mnt \
-v /data:/data \
-v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \
${IMAGES_ID}
Execute the following command to create the container:
bash start-docker.sh imagesID ContainerName
Execute the following command to enter the container:
docker exec -it {Container Name} bash
2.2.5 Starting the Inference Serviceο
Write the following content into set_env.sh:
export RANKTABLEFILE="/path/ranktable.json" # Modify the corresponding path
export MIES_CONTAINER_IP=xx.xx.xxx.xxx # Modify to the local IP
export MINDIE_LLM_LOG_TO_STDOUT=1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export OMP_NUM_THREADS=1
export HCCL_DETERMINISTIC=false
export HCCL_OP_EXPANSION_MODE="AIV"
export NPU_MEMORY_FRACTION=0.96
export HCCL_CONNECT_TIMEOUT=7200
export HCCL_EXEC_TIMEOUT=0
source /usr/local/Ascend/mindie/set_env.sh
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
source /usr/local/Ascend/atb-models/set_env.sh
export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0
export MINDIE_LOG_TO_STDOUT=1
export MASTER_IP=xx.xx.xxx.xxx # Modify to the master node IP
for var in $(compgen -e | grep 'STDOUT$'); do
export "$var=0"
done
for var in $(compgen -e | grep 'LOG_TO_FILE$'); do
export "$var=0"
done
export INF_NAN_MODE_ENABLE=0
# Modify permissions
find /usr/local/lib/python3.11/site-packages/mindie* -name config.json |xargs chmod -R 640
Execute the following command to load the environment variables:
source /path/set_env.sh
Modify the service - oriented parameter configuration in the /usr/local/Ascend/mindie/latest/mindie - service/conf/config.json file. The example is as follows:
{
"Version" : "1.0.0",
"LogConfig" :
{
"logLevel" : "Info",
"logFileSize" : 20,
"logFileNum" : 20,
"logPath" : "logs/mindie - server.log"
},
"ServerConfig" :
{
"ipAddress" : "127.0.0.1",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : false,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm",
"tokenTimeout" : 3600,
"e2eTimeout" : 3600
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : true,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : false,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 32768,
"maxInputTokenLen" : 1536,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "ds_r1",
"modelWeightPath" : "/path/weight/", # Modify to the real weight path
"worldSize" : 8,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false,
"tp":4,
"dp":4,
"moe_ep":4,
"moe_tp":4
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 1,
"maxPrefillTokens" : 4096,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 2048,
"maxIterTimes" : 31232,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
At the same time, in the /usr/local/Ascend/mindie/latest/mindie - service/ path on the dual - machine, execute the following command to start the inference service:
./bin/mindieservice_daemon
If the following command is printed, it is considered that the service has been successfully started:
βDaemon start success!β
2.2.6 Installation of AISbench Evaluation Toolο
Outside the image, install the aisbench evaluation tool:
conda create --name ais_bench python=3.10 -y
conda activate ais_bench
git clone https://github.com/AISBench/benchmark.git
cd benchmark/
pip3 install -e ./
pip3 install -r requirements/api.txt
III. Dataset Evaluation Processο
3.1 Preparing AIME 2024 and MATH - 500 Datasetsο
aime 2024
# Ensure that you are at the outermost path of the source code your/work/dir/benchmark
cd ais_bench/datasets
mkdir aime
cd aime
wget http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/aime.zip
unzip http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/aime.zip
cd ../../../ # Return to the outermost path of the source code your/work/dir/benchmark
math - 500
# Ensure that you are at the outermost path of the source code your/work/dir/benchmark
cd ais_bench/datasets
wget http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/math.zip
unzip http://opencompass.oss - cn - shanghai.aliyuncs.com/datasets/data/math.zip
cd ../../ # Return to the outermost path of the source code your/work/dir/benchmark
3.2 Modifying the Model Inference Back - end Configuration Fileο
The inference back - end configuration file contains the request configuration related to accessing MindIE - Service. Execute the following command to edit the corresponding configuration file (.py file):
# Ensure that you are at the outermost path of the source code your/work/dir/benchmark
vim ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py
The content of the inference back - end configuration file is modified as follows:
from ais_bench.benchmark.models import VLLMCustomAPIChat
models = [
dict(
type=VLLMCustomAPIChat,
abbr='vllm - api - general - chat',
max_seq_len = 4096,
query_per_second = 1,
rpm_verbose = False,
retry = 2,
host_ip = "xxx.xxx.xxx.xx", # Inference service IP, configure according to the actual situation
host_port = 1025, # Inference service port, configure according to the actual situation
enable_ssl = False,
max_out_len = 32768, # Maximum output tokens length is configured to 32K
generation_kwargs = dict( # Post - processing parameters refer to Parameters in https://docs.vllm.ai/en/latest/api/inference_params.html#sampling - params. The configuration for this test is as follows
temperature = 0,
top_p = 0.95,
seed = None,
)
)
]
3.3 Modifying the AIME 2024 and MATH - 500 Dataset Configuration Filesο
aime 2024 The inference back - end configuration file contains information related to how the dataset is processed. Execute the following command to edit the corresponding configuration file (.py file):
# Ensure that you are at the outermost path of the source code your/work/dir/benchmark
vim ais_bench/benchmark/configs/datasets/aime2024/aime2024_gen_0_shot_chat_prompt.py
The content of the dataset configuration file is modified as follows:
from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer
from ais_bench.benchmark.datasets import Aime2024Dataset, MATHEvaluator, math_postprocess_v2
aime2024_reader_cfg = dict(
input_columns=['question'],
output_column='answer'
)
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict(
role='HUMAN',
prompt='{question}\nPlease reason step by step, and put your final answer within \\boxed{}.'
),
],
),
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer, batch_size=4) # batch_size represents the number of parallel requests sent
)
aime2024_eval_cfg = dict(
evaluator=dict(type=MATHEvaluator, version='v2'), pred_postprocessor=dict(type=math_postprocess_v2)
)
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
path='ais_bench/datasets/aime/aime.jsonl', # Dataset path. When using a relative path, it is relative to the root path of the source code. Absolute paths are also supported
reader_cfg=aime2024_reader_cfg,
infer_cfg=aime2024_infer_cfg,
eval_cfg=aime2024_eval_cfg
)
]
math - 500 The inference back - end configuration file contains information related to how the dataset is processed. Execute the following command to edit the corresponding configuration file (.py file):
# Ensure that you are at the outermost path of the source code your/work/dir/benchmark
vim ais_bench/benchmark/configs/datasets/math/math500_gen_0_shot_cot_chat_prompt.py
The content of the dataset configuration file is modified as follows:
from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer
from ais_bench.benchmark.datasets import CustomDataset
from ais_bench.benchmark.openicl.icl_evaluator import MATHEvaluator
math_reader_cfg = dict(input_columns=['problem'], output_column='solution')
math_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict(
role='HUMAN',
prompt='{problem}\nPlease reason step by step, and put your final answer within \\boxed{}.'
),
],
),
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer, batch_size=16), # batch_size represents the number of parallel requests sent
)
math_eval_cfg = dict(evaluator=dict(type=MATHEvaluator))
math_datasets = [
dict(
type=CustomDataset,
abbr='math_prm800k_500',
path='ais_bench/datasets/math', # Dataset path. When using a relative path, it is relative to the root path of the source code. Absolute paths are also supported
file_name='test_prm800k_500.jsonl', # Use math500 in math
reader_cfg=math_reader_cfg,
infer_cfg=math_infer_cfg,
eval_cfg=math_eval_cfg,
)
]
3.4 Starting the Evaluationο
Execute the following commands to start the evaluation:
aime 2024
ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt --summarizer example
Detailed process logs and result files are saved in the execution command path βoutputs/default/
tail -f outputs/default/<timestamp>/logs/infer/vllm - api - general - chat/aime2024.out
math - 500
ais_bench --models vllm_api_general_chat --datasets math500_gen_0_shot_cot_chat_prompt --summarizer example
Detailed process logs and result files are saved in the execution command path βoutputs/default/
tail -f outputs/default/<timestamp>/logs/infer/vllm - api - general - chat/math_prm800k_500.out
Note that if the environment has a proxy configured, you need to execute the following command to make the proxy ineffective for the inference service IP (master node IP):
export no_proxy="xxx.xxx.xxx.xx"
3.5 Viewing Evaluation Resultsο
The evaluation results will be printed directly on the screen. Meanwhile, the detailed structure of result files and process logs in the path βoutputs/default/
outputs/default/
βββ
IV. Verification of Evaluation Resultsο
The evaluation results are shown in the figures below:


V. Reproduction Resultsο
Through the in-depth collaboration between the Ascend 800I-A2 inference card and the AISBench evaluation tool, this study has achieved the core goal in reproducing the DeepSeek-R1 mathematical reasoning model. The key achievements are as follows:
AIME2024 Dataset: Achieved Pass@1 80.0 points (official score: 79.8 points), with an error of only 0.2 points (οΌ1 question);
MATH-500 Dataset: Achieved Pass@1 97.6% (official accuracy: 97.3%), with an error of 0.3% (οΌ0.5% target).
The accurate reproduction of the paperβs indicators proves the technical maturity of the Ascend 800I-A2 inference card in scenarios such as symbolic computation and long-sequence parallel processing.
Dataset |
Official Accuracy |
Ascend A2 Accuracy |
Error |
Reproduction Result |
|---|---|---|---|---|
aime 2024 |
79.8 |
80.00 |
0.2 |
pass |
math-500 |
97.3% |
97.6% |
0.3% |
pass |
Note: The metric βaccuracy = 80.00β obtained by AISBench is equivalent to Pass@1 in a single run. The Pass@1 score of DeepSeek R1 in the paper is 79.8 (with temperature = 0.6, top_k = 0.95, and the average value of k runs, where k ranges from 4 to 64). The original paper content is shown in the figure below:
