User Configuration Parameters

AISBench Benchmark supports customizing the inference mode and evaluation process through two methods: Command Line Interface (CLI) Parameters and Configuration Constant File.

Command Line Parameters

The basic calling format for command line parameters [OPTIONS] is as follows:

ais_bench [OPTIONS]

Parameter Description

Based on the execution scenario, command line parameters are divided into three categories:

  • Common Parameters

  • Accuracy Evaluation Parameters (effective only when --mode is set to all, infer, eval, or viz)

  • Performance Evaluation Parameters (effective only when --mode is set to perf or perf_viz)

Accuracy Evaluation Parameters take effect only when the --mode parameter is specified as "all", "infer", "eval", "viz". Performance Evaluation Parameters take effect only when the --mode parameter is specified as "perf", "perf_viz". Common Parameters are not restricted by the task execution mode and can be specified in all modes.

### Common Parameters

Applicable to all modes and can be used in combination with accuracy or performance parameters.

Parameter

Description

Example

--models

Specifies the name of the model inference backend task (corresponding to a pre-implemented default model configuration file under the path ais_bench/benchmark/configs/models). Multiple task names are supported. For details, refer to 📚 Supported Models

--models vllm_api_general

--datasets

Specifies the name of the dataset task (corresponding to a pre-implemented default dataset configuration file under the path ais_bench/benchmark/configs/datasets). Multiple dataset names are supported. For details, refer to 📚 Supported Dataset Types

--datasets gsm8k_gen

--summarizer

Specifies the name of the result summary task (corresponding to a pre-implemented default configuration file under the path ais_bench/benchmark/configs/summarizers). For details, refer to 📚 Supported Result Summary Tasks

--summarizer medium

--mode or -m

Running mode, optional values: all, infer, eval, viz, perf, perf_viz; default value is all.
For details, refer to 📚 Running Mode Description.

--mode infer
-m all

--reuse or -r

Specifies the timestamp in an existing working directory to continue execution and overwrite original results. Used in conjunction with the --mode parameter, it can resume interrupted inference, or perform accuracy calculation/visualization result printing based on existing inference results. If no parameter is added, the latest timestamp in the --work-dir is automatically selected.

--reuse 20250126_144254
-r 20250126_144254

--work-dir or -w

Specifies the evaluation working directory for saving output results. Default path: outputs/default.

--work-dir /path/to/work
-w /path/to/work

--config-dir

Path to the folder where configuration files for models, datasets, and summarizers are stored. Default path: ais_bench/benchmark/configs.

--config-dir /xxx/xxx

--debug

Enables Debug mode. The mode is enabled if this parameter is configured, and disabled if not; disabled by default. In Debug mode, all logs are printed directly to the terminal. (In Debug mode, the --max-num-workers parameter is forced to 1, tasks are executed serially, and only single-core execution is used, which limits concurrency capabilities.)

--debug

--dry-run

Enables Dry Run mode (prints logs to the screen without actually running tasks). The mode is enabled if this parameter is configured, and disabled if not; disabled by default.

--dry-run

--max-workers-per-gpu

Reserved parameter; not currently supported.

--max-workers-per-gpu 1

--merge-ds

Enables merged inference for datasets of the same type (runs multiple datasets for the same task together).

--merge-ds

--num-prompts

Specifies the number of test cases for the dataset (selected in dataset order). A positive integer must be passed. If the number exceeds the total number of cases in the dataset or no value is specified, the entire dataset is used for testing.

--num-prompts 500

--max-num-workers

Number of parallel tasks, range: [1, number of CPU cores]; default value: 1. Invalid when --debug is specified; all tasks are executed serially.

--max-num-workers 2

--num-warmups

Number of warm-up runs before sending requests. Data is selected in dataset order for testing. When num-warmups exceeds the number of dataset entries, data from the dataset will be sent in a loop. Default value: 1; set to 0 to disable warm-up. If all requests fail during the warmup phase, subsequent inference tasks will not be executed.

--num-warmups 10

### Accuracy Evaluation Parameters

Valid only when the mode is all, infer, eval, or viz.

Parameter

Description

Example

--dump-eval-details

Toggle to dump details of the evaluation process. Enabled if configured, disabled if not; disabled by default.

--dump-eval-details

--dump-extract-rate

Toggle to dump evaluation speed data. Enabled if configured, disabled if not; disabled by default.

--dump-extract-rate

### Performance Evaluation Parameters

Valid only when the mode is perf or perf_viz.

Parameter

Description

Example

--pressure

Switch to enable performance pressure testing mode. Effective only when --mode perf is set. Enabled if this parameter is configured, disabled if not; disabled by default. For details on pressure testing, refer to 📚 Enabling Steady-State Testing with Stress Testing.

--pressure

--pressure-time

Duration of pressure testing. Only takes effect when --pressure mode is specified. Unit: seconds; default value: 15 seconds; value range: [1, 86400] (i.e., 1 second to 24 hours).

--pressure-time 30

Configuration Constant File Parameters

Some global constants are not restricted by task type, and it is recommended to keep their default values. If customization is required, edit the constant file: global_consts.py for configuration.

The currently supported parameter configurations are as follows:

Parameter Name

Description

Value Range / Requirements

WORKERS_NUM

Number of processes used for sending requests. The default value is 0, which means automatic allocation based on the maximum number of concurrent requests configured by the user. (Invalid when the command-line parameter --debug is specified; single-core execution is used for sending requests, which limits concurrency capabilities.)

[0, number of CPU cores]

MAX_CHUNK_SIZE

Maximum cache size for a single chunk returned by the streaming inference model backend. The default value is 65535 bytes (64KB).

(0, 16777216] (Unit: Byte)

REQUEST_TIME_OUT

Timeout period for the client to wait for a response after sending a request. The default value is None, meaning infinite waiting (always waiting for the model to return results).

None or >0 (Unit: seconds)

LOG_LEVEL

Log level, optional values: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default value: INFO.

[DEBUG, INFO, WARNING, ERROR, CRITICAL]