Error Code Description

TMAN-CMD-001

Error Description

This error indicates that a required input parameter is missing when executing a command. When launching the ais_bench evaluation tool via the command line, you must specify the model configuration and dataset configuration.

Examples of valid scenarios:

# When using an open-source dataset, you must specify the model task via `--models` and the dataset task via `--datasets`
ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen
# When using a custom dataset, you must specify the model task via `--models` and the custom dataset path via `--custom_dataset_path`
ais_bench --models vllm_api_stream_chat --custom_dataset_path /path/to/custom/dataset

Solution

Refer to the examples of valid scenarios to supplement the missing parameters.

TMAN-CMD-002

Error Description

This error indicates that the value of a command-line parameter is not within the valid range.

Solution

Search this document for the specific command line that appears in the log, and find the constraints on parameter values specified in the command line description.
For example, if this error occurs when executing ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen --num-prompts -1 --mode perf, search for --num-prompts in the document to find the constraints in the parameter description.

Parameter	Description	Example
`--num-prompts`	Specifies the number of test cases to evaluate in the dataset. A positive integer must be entered. If the value exceeds the number of dataset cases or is not specified, the entire dataset will be evaluated.	`--num-prompts 500`

The parameter description specifies that the value must be a positive integer (greater than 0).

TMAN-CFG-001

Error Description

There is a syntax error in the .py configuration file, causing parsing failure.

Solution

Check the Python syntax errors in the configuration file printed in the log (all configurable files for the ais_bench evaluation tool follow Python syntax), such as missing quotation marks or mismatched parentheses, and correct them.

TMAN-CFG-002

Error Description

A required parameter is missing from the .py configuration file, causing parsing failure. For example, the specific error log is: Config file /path/to/vllm_api_stream_chat.py does not contain 'models' param!, which indicates that the models parameter is missing from the configuration file.

A valid vllm_api_stream_chat.py file contains the models parameter:

# ......
models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-stream-chat",
        # ......
    )
]

Solution

In the .py configuration file printed in the error log, add the parameter that the log indicates is missing.

TMAN-CFG-003

Error Description

A parameter in the .py configuration file has an incorrect type, causing parsing failure. For example, the relevant configuration in the vllm_api_stream_chat.py configuration file is:

# ......
models = dict(
    attr="service",
    type=VLLMCustomAPIChat,
    abbr="vllm-api-stream-chat",
    # ......
)

The specific error log is: In config file /path/to/vllm_api_stream_chat.py, 'models' param must be a list!, which indicates that the models parameter in the configuration file has an incorrect type. It should be a list type (but is actually a dictionary type).

Solution

In the .py configuration file printed in the error log, correct the incorrect parameter type to the required type as indicated by the log.

UTILS-MATCH-001

Error Description

The task name specified via --models, --datasets, or --summarizer cannot be matched to a .py configuration file with the same name as the task.

Solution

Check the task name that the log indicates cannot be matched. For example, if xxxx cannot be matched, the following log will be printed:

+------------------------+
| Not matched patterns   |
|------------------------|
| xxxx                   |
+------------------------+

Scenario 1: The configuration file folder path is not specified

First, execute pip3 show ais_bench_benchmark | grep "Location:" to check the installation path of the ais_bench evaluation tool. For example, the following information is obtained after execution:

Location: /usr/local/lib/python3.10/dist-packages

The configuration file path is then /usr/local/lib/python3.10/dist-packages/ais_bench/benchmark/configs. Navigate to this path and perform the following checks:

If the unmatchable task name is specified via --models, check whether there is a .py configuration file with the same name as the task in the models/ path (including subdirectories).
If the unmatchable task name is specified via --datasets, check whether there is a .py configuration file with the same name as the task in the datasets/ path (including subdirectories).
If the unmatchable task name is specified via --summarizer, check whether there is a .py configuration file with the same name as the task in the summarizers/ path (including subdirectories).

Scenario 2: The configuration file folder path is specified

If you specified the configuration file folder path via --config-dir when executing the command, navigate to this path and perform the following checks:

If the unmatchable task name is specified via --models, check whether there is a .py configuration file with the same name as the task in the models/ path (including subdirectories).
If the unmatchable task name is specified via --datasets, check whether there is a .py configuration file with the same name as the task in the datasets/ path (including subdirectories).
If the unmatchable task name is specified via --summarizer, check whether there is a .py configuration file with the same name as the task in the summarizers/ path (including subdirectories).

UTILS-CFG-001

Error Description

When using the randomly synthesized dataset in the tokenid scenario, the model configuration file must specify the tokenizer path.

Solution

Assume the ais_bench evaluation tool command is ais_bench --models vllm_api_stream_chat --datasets synthetic_gen_tokenid --mode perf. Then, all path parameters in the models section of the vllm_api_stream_chat.py configuration file (refer to [Modifying Configuration Files for Corresponding Tasks](…/get_started/quick_start.md#Modifying Configuration Files for Corresponding Tasks) for the configuration file path retrieval method) must be set to the tokenizer path (usually the model weight folder path).

# ......
models = dict(
    attr="service",
    type=VLLMCustomAPIChat,
    abbr="vllm-api-stream-chat",
    path="/path/to/tokenizer", # Enter the tokenizer path
    # ......
)

UTILS-CFG-002

Error Description

Initializing a model instance using parameters in the model configuration file failed due to invalid parameter content.

Solution

Check the log for build failed with the following errors:{error_content}, and correct the parameters in the model configuration file according to the prompts in error_content. For example, if the batch_size parameter value in the model configuration file is 100001, and error_content is "batch_size must be an integer in the range (0, 100000]", this indicates that the batch_size parameter exceeds the valid range (0, 100000]. You need to correct the batch_size parameter value to 100000.

UTILS-CFG-003

Error Description

The value of a parameter in the model configuration file is outside the range limited by the tool.

Solution

Configure the parameter value within the range limited by the tool according to the prompts in the detailed log. For example, if the configuration file content is:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service1",
        # ......
    )
]

The detailed error log is:

Model config contain illegal attr, 'attr' in model config is 'service1', only 'local' and 'service' are supported!

This indicates that the value of the attr parameter in the model configuration is 'service1', but the tool only supports the values 'local' and 'service'. You need to set the attr parameter to one of the valid values.

UTILS-CFG-004

Error Description

Some configuration items for model parameters must be consistent across all model configurations and cannot have different values.

Solution

Unify the configuration values according to the prompts in the detailed log. For example, if the configuration file content is:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
    ),
    dict(
        attr="local"
    )
]

The detailed error log is:

Cannot run local and service model together! Please check 'attr' parameter of models

Because the models configuration contains two parameter values: 'service' and 'local', but the tool only supports a unified configuration of one value. Therefore, you need to set the attr parameter in the models configuration to either 'service' or 'local'.

UTILS-CFG-008

Error Description

The loaded multimodal dataset contains invalid content.

Solution

If the error log is Invalid dataset: /path/to/non-mm-dataset , please check whether the dataset is a MM-style dataset!, it means the specified dataset /path/to/non-mm-dataset is not a valid multimodal dataset. Each piece of data in a valid dataset must contain at least a type or path field. If it contains a type field, the value of the type field must be one of ["image", "video", "audio"].
If the error log is Param 'mm_type' does not match the data type of dataset: /path/to/mm-dataset , please check it!, it means the specified dataset /path/to/mm-dataset is a valid multimodal dataset, but the value of the mm_type field in the prompt engineering configuration of the dataset configuration file is invalid. The valid values for the mm_type field must be one of ["image", "video", "audio"].

UTILS-DEPENDENCY-001

Error Description

A required dependency module is missing during execution.

Solution

If the detailed error log is Failed to import required modules. Please install the necessary packages: pip install math_verify, follow the guidance in the detailed log and execute pip install math_verify to install the dependent library.

UTILS-TYPE-001

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-002

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-003

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-004

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-005

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-006

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-007

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-TYPE-008

Error Description

The command-line parameter value is too large.

Solution

If the error log shows '--max-num-workers' must be <= 8, but got 9 ......, it indicates that the value of the command-line parameter --max-num-workers is 9. However, the tool only supports a maximum of 8 concurrent workers. Therefore, you need to adjust the value of --max-num-workers to be ≤ 8.

UTILS-TYPE-009

Error Description

The command-line parameter value is not an integer type.

Solution

If the error log shows '--max-num-workers' must be an integer, but got '9' ......, it indicates that the value of the command-line parameter --max-num-workers is the string ‘9’. However, the tool only supports integer-type values. Therefore, you need to correct the value of --max-num-workers to an integer type.

UTILS-TYPE-010

Error Description

The command-line parameter value is too small.

Solution

If the error log shows '--max-num-workers' must be >= 1, but got 0 ......, it indicates that the value of the command-line parameter --max-num-workers is 0. However, the tool only supports at least 1 concurrent worker. Therefore, you need to adjust the value of --max-num-workers to be ≥ 1.

UTILS-PARAM-001

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-PARAM-002

Error Description

In the custom dataset scenario, the request_count parameter in the configuration file *.meta.json is outside the valid range.

Solution

If the error message is Please make sure that the value of parameter 'request_count' can be converted to int(greater than 0)., it means the request_count parameter in *.meta.json needs to be set to > 0.

UTILS-PARAM-003

Error Description

In the custom dataset scenario, the min_value parameter is greater than the max_value parameter in the configuration file *.meta.json.

Solution

If the error message is When the uniform distribution is set, parameter 'min_value' must be less than or equal to parameter 'max_value'., it means the min_value parameter in *.meta.json needs to be set to ≤ the max_value parameter. You need to correct it to min_value ≤ max_value.

UTILS-PARAM-004

Error Description

In the custom dataset scenario, the min_value and max_value parameters in the configuration file *.meta.json are outside the valid range.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

UTILS-PARAM-005

Error Description

In the custom dataset scenario, the configuration file *.meta.json lacks required parameters.

Solution

For example, if the error message is When the uniform distribution is set, parameter 'min_value' and 'max_value' must be provided., it means that in the uniform distribution scenario, both the min_value and max_value parameters need to be set in *.meta.json.

UTILS-PARAM-006

Error Description

In the custom dataset scenario, the percentage_distribute parameter in the configuration file *.meta.json is invalid.

Solution

The valid value range for the percentage_distribute parameter is described in the detailed log as follows:

 Ensure the configuration data follows the format [max_tokens, percentage], where:
    - 'max_tokens' must be a positive number (greater than 0).
    - 'percentage' must be a float between 0 and 1 (greater than 0 and inclusive 1).
    - The sum of all 'percentage' values must equal exactly 1.
    Example valid format: [[1000, 0.5],[500,0.5]] or [[2000, 1.0]]
    Example invalid formats: [[0, 0.5]] (max_tokens <= 0), [[1000, 1.5]] (percentage > 1), [[1000, 0.3], [500,0.2]] (sum not 1)

UTILS-PARAM-007

Error Description

In the custom dataset scenario, the value of the method parameter (which defines the data distribution method) in the configuration file *.meta.json is outside the valid range.

Solution

If the error message is Type of data distribution(method): uniform1 not supported, legal methods chosen from ['uniform', 'percentage']., it means the value uniform1 of the method parameter in *.meta.json is outside the valid range. You need to correct the method parameter value to either uniform or percentage.

UTILS-PARAM-008

Error Description

In the custom dataset scenario, the configuration file *.meta.json contains invalid fields.

Solution

If the specific error message is There are illegal keys: xxxxxx,yyyyyy, it means the *.meta.json file contains the two invalid fields xxxxxx and yyyyyy. You need to delete these two fields from *.meta.json.

UTILS-FILE-002

Error Description

The tokenizer path specified by the path parameter in the model configuration file does not exist.

Solution

If the content of the model configuration file is as follows:

# In vllm_stream_api_chat.py
models = [
    dict(
        # ......
        path="/path/to/invalid",
        # ......
    ),
]

And the specific error log is Tokenizer path '/path/to/invalid' does not exist, it indicates that the tokenizer path /path/to/invalid specified by the path parameter in the model configuration file does not exist (an empty path is also considered non-existent). You need to correct it to an existing tokenizer path.

UTILS-FILE-003

Error Description

Failed to load the tokenizer file.

Solution

If the error message is Failed to load tokenizer from /path/to/tokenizer: ExceptionName: XXXXXX, first confirm whether the tokenizer file under the path /path/to/tokenizer is compatible with the transformers version of the current runtime environment. If compatible, perform further troubleshooting based on the specific error information represented by XXXXXX.

UTILS-FILE-004

Error Description

No direct solution is available yet.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

PARTI-FILE-001

Error Description

Insufficient permissions for the output path file; the tool cannot write results to it.

Solution

For example, if the error log is:

Current user can't modify /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json, reuse will not enable.

Execute ls -l /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json to check the owner and permissions of this path. If the current user does not have write permission for the file, you need to add write permission for the current user (for example, execute chmod u+w /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json to add write permission for the current user).

CALC-MTRC-001

Error Description

The performance result data is invalid, and metrics cannot be calculated.

Solution

Scenario 1: The original performance result data is empty

If you specified recalculation of performance results via --mode perf_viz when executing the command, and the base output path is outputs/default/20250628_151326 (find Current exp folder: in the console output), check whether all *_details.jsonl files in the performances/ folder under this path are empty. If they are empty, you need to run the evaluation once first to generate performance result data.

Scenario 2: The original performance result data contains no valid values

If you specified recalculation of performance results via --mode perf_viz when executing the command, and the base output path is outputs/default/20250628_151326 (find Current exp folder: in the console output), check whether the *_details.jsonl files in the performances/ folder under this path contain no valid fields (they may have been tampered with). If so, you need to re-run the performance evaluation to generate new data.

CALC-FILE-001

Error Description

Failed to save performance result data to disk.

Solution

If the detailed error log is:

Failed to write request level performance metrics to csv file '{/path/to/workspace/outputs/default/20250628_151326/performances/vllm-api-stream-chat/gsm8k.csv': XXXXXX

Where XXXXXX is the specific reason for the disk-saving failure. For example, Permission denied means the file already exists and the current user does not have write permission. You can either delete the file or add write permission for the current user to the existing file.

CALC-DATA-001

Error Description

No valid performance metric data was obtained for all completed inference requests, and metrics cannot be calculated.

Solution

If the specific log is:

All requests failed, cannot calculate performance results. Please check the error logs from responses!

This indicates that all requests during the inference process failed. You need to further check the logs of failed requests to identify the cause of the failure.

If the command includes --debug, the logs of failed requests will be printed directly to the console, and you can view them in the console records.
If the command does not include --debug, the console records will contain logs similar to [ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out. You can view the specific cause of the request failure in outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out.

CALC-DATA-002

Error Description

When calculating steady-state performance metrics, no requests belonging to the steady state were found among all request information, and steady-state metrics cannot be calculated.

Solution

You can check the concurrency graph of inference requests (reference document: https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to confirm whether the Request Concurrency Count in the concurrency step graph reaches the concurrency number set in the model configuration file (the batch_size parameter) and at least two requests reach the maximum concurrency number.

If the above conditions are not met, you can try the following methods to achieve a steady state:

Scenario A: `Request Concurrency Count` in the concurrency step graph increases continuously and then decreases continuously

Reduce the concurrency number of inference requests (the batch_size parameter in the model configuration file).
Increase the total number of inference requests.

Scenario B: `Request Concurrency Count` in the concurrency step graph increases continuously, fluctuates for a period of time, and then decreases continuously

Reduce the concurrency number of inference requests (the batch_size parameter in the model configuration file).
Increase the frequency of sending inference requests (the request_rate parameter in the model configuration file).

SUMM-TYPE-001

Error Description

The abbr parameter configurations of all dataset tasks are mixed (i.e., use different types).

Solution

For example, if the error log is:

mixed dataset_abbr type is not supported, dataset_abbr type only support (list, tuple) or str.

This indicates that in the datasets configuration, the abbr parameter configurations of all dataset tasks use different types (e.g., list and str). You need to unify the abbr parameter configurations of all dataset tasks to use the same type (e.g., list or str).

SUMM-FILE-001

Error Description

There are no performance data files (*_details.jsonl) in the output working path.

Solution

Confirm whether you incorrectly specified recalculation of performance results via --mode perf_viz when executing the evaluation. If you want to run a complete performance test, specify --mode perf.
Confirm whether the base output path is correct (e.g., outputs/default/20250628_151326; find Current exp folder: in the console output).
Confirm whether there are *_details.jsonl files in the performances/ folder under this path. If not, check other error information in the previous console logs to confirm whether other errors caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs.

SUMM-MTRC-001

Error Description

The number of valid fields is inconsistent across requests in the detailed performance data.

Solution

Check whether the number of valid fields is consistent across all requests in the *_details.jsonl files under the base output path (e.g., outputs/default/20250628_151326; find Current exp folder: in the console output). If inconsistent, check whether there are other errors in the historical console logs that caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs.

RUNNER-TASK-001

Error Description

The evaluation task failed to execute.

Solution

For example, if the specific error message is [ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out, please view the specific error information in outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out to identify the cause of the failure.

TINFER-PARAM-001

Error Description

The maximum concurrency value batch_size in the model configuration file is outside the valid range.

Solution

If the error log shows Concurrency must be greater than 0 and <= 100000, but got -1, it means the maximum concurrency of the model is configured as -1. You need to set the batch_size parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000.

Example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        batch_size=100,
        # ......
    ),
]

TINFER-PARAM-002

Error Description

The num_return_sequences parameter (number of returned sequences) of the generation_kwargs parameter in the model configuration file is outside the valid range.

Solution

If the error log shows num_return sequences must be a positive integer, but got {0}, it means the number of returned sequences of the model is configured as 0. You need to set the num_return_sequences parameter in the model configuration file to an integer greater than 0.

Example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        generation_kwargs=dict(
            num_return_sequences=1,
        ),
        # ......
    ),
]

TINFER-PARAM-004

Error Description

The ramp_up_strategy parameter (ramp-up strategy) of the traffic_cfg parameter in the model configuration file is outside the valid range.

Solution

If the error log shows Invalid ramp_up_strategy: {constant} only support 'linear' and 'exponential', it means the request sending strategy of the model is configured to a value not in ['exponential', 'linear']. You need to set the ramp_up_strategy parameter in the model configuration file to 'exponential' or 'linear'.

Example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        traffic_cfg=dict(
            ramp_up_strategy="linear",
        ),
        # ......
    ),
]

TINFER-PARAM-005

Error Description

Excessively high virtual memory usage when the tool runs inference.

Solution

If the specific error log is:

Virtual memory usage too high: 90% > 80% (Total memory: 50 GB "Used: 45 GB, Available: 5 GB, Dataset needed memory size: 3000 MB)

It indicates that the current system memory is 50GB, with 45GB used and 5GB available, while the dataset requires 3000MB of memory, thus triggering this error. Solutions are divided into two cases:

If the total system memory is insufficient, increase the system memory.
If the total system memory is sufficient but the memory required by the dataset is greater than the available memory, clear the occupied memory or cache on the current server.

TINFER-PARAM-006

Error Description

No timestamps found in datasets, but use_timestamp is True! Make sure your dataset contains timestamp field or set use_timestamp to False in model config.

Solution

If the error log is No timestamps found in datasets, but use_timestampis True! Make sure your dataset containstimestampfield or setuse_timestamp to False in model config., it means the dataset configuration file does not contain the timestamp field, but the use_timestamp parameter in the model configuration file is True. You need to set the use_timestamp parameter in the model configuration file to False.

Example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        use_timestamp=False,
        # ......
    ),
]

TINFER-IMPL-001

Error Description

When executing a service-oriented inference task, a process fails to start while multiple processes are launched within the inference task.

Solution

If the error log is:

Failed to start worker x: XXXXXX, total workers to launch: 4

Where x is the ID of the failed process, XXXXXX is the specific reason for the failure, and 4 is the total number of processes.

Solutions:

If the number of occurrences of this error log is equal to the total number of processes, it means all processes failed to start. Check the specific failure reason, take corresponding measures, and retry.
If the number of occurrences of this error log is less than the total number of processes, it means some processes failed to start. Partial process startup failures do not affect the execution of the evaluation task but will impact the actual maximum concurrency batch_size. Decide whether to manually interrupt to locate the specific failure reason based on actual circumstances.

TINFER-RUNTIME-001

Error Description

All requests fail during the warm-up phase when evaluating inference serviceization.

Solution

If the error log is Exit task because all warmup requests failed, failed reasons: XXXXXX, locate the problem based on the specific failure reason XXXXXX (error information from the service), take corresponding measures, and retry.

TEVAL-PARAM-001

Error Description

Invalid values for the number of candidate solutions generated by inference n and the number of samples collected from them k.

Solution

If the error log is:

k and n must be greater than 0 and k <= n, but got k: 16, n: 8

It means k is greater than n. You need to configure k to an integer less than or equal to n.

Examples:

If both n and k parameters are configured in the dataset configuration file, set their values to the valid range in the configuration file:

# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
    dict(
        abbr='aime2024',
        type=Aime2024Dataset,
        # ......
        k=4,
        n=8,
    )
]

If the n parameter is not configured in the dataset configuration file, the value of the num_return_sequences parameter in the model configuration file will be used as the value of n. You need to configure k in the dataset configuration file to an integer less than or equal to num_return_sequences in the model configuration file.

# In vllm_stream_api_chat.py, the num_return_sequences parameter corresponds to `n`
models = [
    dict(
        attr="service",
        # ......
        generation_kwargs=dict(
            num_return_sequences=8,
        ),
        # ......
    ),
]

# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
    dict(
        abbr='aime2024',
        type=Aime2024Dataset,
        # ......
        k=4,
    )
]

ICLI-PARAM-001

Error Description

The type of the retriever parameter for constructing prompt engineering in the dataset configuration file is not a subclass of BaseRetriever or a list of subclasses of BaseRetriever.

Solution

If you want to use a custom retriever class CustomedRetriever, ensure that CustomedRetriever is a subclass of BaseRetriever.
If you want to use multiple custom retriever classes CustomedRetriever1, CustomedRetriever2, configure the retriever parameter in the dataset configuration file as [CustomedRetriever1, CustomedRetriever2], and each class in the list must inherit from BaseRetriever.

ICLI-PARAM-002

Error Description

The value of the infer_mode parameter in the inferencer configuration in the multi-turn dialogue dataset configuration file is outside the valid range.

Solution

Taking the mtbench configuration file as an example, if the configuration of mtbench_gen.py is as follows:

mtbench_infer_cfg = dict(
    # ......
    inferencer=dict(type=MultiTurnGenInferencer, infer_mode="every1")
)

The log error is:

Multiturn dialogue infer model only supports every、last or every_with_gt, but got every1

The correct configuration should set the infer_mode parameter to one of every, last, or every_with_gt.

ICLI-PARAM-003

Error Description

When specifying --mode perf --pressure in the command line for performance stress testing, the batch_size parameter is not specified in the model configuration file.

Solution

Taking the vllm_stream_api_chat.py configuration file as an example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        batch_size=16,
        # ......
    ),
]

ICLI-PARAM-004

Error Description

The maximum concurrency value batch_size in the model configuration file is outside the valid range.

Solution

If the error log shows The range of batch_size is [1, 100000], but got -1. Please set it in datasets config, it means the maximum concurrency of the model is configured as -1. You need to set the batch_size parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000.

Example:

# In vllm_stream_api_chat.py
models = [
    dict(
        attr="service",
        # ......
        batch_size=100,
        # ......
    ),
]

ICLI-PARAM-006

Error Description

PPL-type datasets do not support performance testing.

Solution

Check the used dataset configuration file, for example:

# In ARC_c_ppl_0_shot_str.py
ARC_c_infer_cfg = dict(
    # ......
    inferencer=dict(type=PPLInferencer))

The type of inferencer is PPLInferencer. Such dataset configuration files do not support performance testing, so you need to replace them with other dataset configuration files or specify --mode all to execute accuracy evaluation.

ICLI-PARAM-007

Error Description

PPL-type datasets do not support inference using streaming model configurations.

Solution

Check the used dataset configuration file, for example:

# In ARC_c_ppl_0_shot_str.py
ARC_c_infer_cfg = dict(
    # ......
    inferencer=dict(type=PPLInferencer))

The type of inferencer is PPLInferencer. Such dataset configuration files do not support inference using streaming model configurations, so you need to replace them with other dataset configuration files, or specify a non-streaming model configuration file via --models, such as --models vllm_api_general_chat.

ICLI-IMPL-004

Error Description

BFCL datasets do not support performance testing.

Solution

If you want to use the BFCL dataset task for accuracy testing but mistakenly specify --mode perf in the command line (which triggers performance testing), change the command line to --mode all to specify accuracy testing.
If you want to use the BFCL dataset task for performance testing, it is not supported currently.

ICLI-IMPL-006

Error Description

Model tasks with streaming interfaces do not support accuracy evaluation using BFCL datasets.

Solution

Refer to Model Configuration Instructions and select model tasks with text interfaces (e.g., vllm_api_general_chat) for inference.

ICLI-IMPL-008

Error Description

The model backend corresponding to the current model configuration file has not implemented the methods required for PPL inference.

Solution

Refer to the documentation (not yet available) to check which model configurations support PPL inference, such as vllm_api_general_chat.

ICLI-IMPL-010

Error Description

No token IDs in the result of a PPL inference, leading to failure in loss calculation.

Solution

Verify whether the tested inference object (inference service) supports PPL inference and can normally return valid prompt_logprobs required for PPL inference.

ICLI-RUNTIME-001

Error Description

Failed to obtain inference results when accessing the inference service during warm-up.

Solution

If the log shows Get result from cache queue failed: XXXXXX (where XXXXXX is the specific reason for the failure to obtain inference results), take corresponding measures based on the specific reason (e.g., if it is a timeout-related exception, confirm whether the timeout setting of the inference service is reasonable or check if the current configuration can access the inference service normally).

ICLI-FILE-001

Error Description

Failed to write inference result files to disk.

Solution

If the log shows Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.json: XXXXXX, it means the inference results failed to be written to disk in the accuracy scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated by XXXXXX (e.g., permission issues, insufficient disk space, etc.).
If the log shows Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.jsonl: XXXXXX, it means the inference results failed to be written to disk in the performance scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated by XXXXXX (e.g., permission issues, insufficient disk space, etc.).

ICLI-FILE-002

Error Description

Failed to save numpy-format data (e.g., ITL data for each request) to the database.

Solution

If the log shows Failed to save numpy array to database: XXXXXX, it means the numpy-format data failed to be saved to the database. Troubleshoot and resolve the issue based on the specific saving reason indicated by XXXXXX (e.g., database connection issues, non-existent database tables, etc.).

ICLE-DATA-002

Error Description

The configured number of candidate solutions generated by inference n is inconsistent with the actual number of returned candidate solutions.

Solution

If --mode all is specified in the command line or --mode is not specified (indicating execution of infer + evaluate), triggering this exception means there is a bug in the tool itself. You can provide feedback in the issue.
If --mode eval is specified in the command line (evaluation based on previous inference results), and the exception error is: Replication length mismatch, len of replications: 4 != n: 8, then set the parameter n in the configuration file corresponding to the dataset task to the number of replications 4:

# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
    dict(
        abbr='aime2024',
        type=Aime2024Dataset,
        # ......
        n=4,
    )
]

ICLR-TYPE-001

Error Description

In the dataset configuration file, the type of the prompt template is incorrect. Only str or dict types are supported currently.

Solution

Ensure that the type of the prompt template in the inference configuration of the dataset configuration file is str or dict, for example:

# In aime2024_gen_0_shot_str.py
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template='{question}\nPlease reason step by step, and put your final answer within \\boxed{}.' # str type
    ),
    # ......
)

# In aime2024_gen_0_shot_chat_prompt.py
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template=dict( # dict type
            round=[
                dict(
                    role="HUMAN",
                    prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
                ),
            ],
        ),
    ),
    # ......
)

If the type of the value of the template parameter is incorrect, correct it to str or dict type.

ICLR-TYPE-002

Error Description

In the dataset configuration file, when the type of the prompt template is dict, the value type of all key-value pairs in it is incorrect. Currently, the supported value types are only str, list, and dict.

Solution

Ensure that in the dataset configuration file, the value type of all key-value pairs in the prompt template under the inference configuration is str, list, or dict. For example:

# In aime2024_gen_0_shot_chat_prompt.py
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template=dict( # dict type
            round=[
                dict(
                    role="HUMAN", # str type
                    prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", # str type
                ),
            ],
        ),
    ),
    # ......
)

ICLR-PARAM-001

Error Description

In the dataset configuration file, when the ice_token parameter is configured in the prompt template, the value of the template parameter does not contain the value of the ice_token parameter.

Solution

When the type of the template parameter is str, ensure that the string value of template contains the value of the ice_token parameter. For example:

# In ceval_gen_5_shot_str.py
ceval_infer_cfg = dict(
    ice_template=dict(
        type=PromptTemplate,
        template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n</E>{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}', # The string contains '</E>', the value of ice_token
        ice_token='</E>',
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
    inferencer=dict(type=GenInferencer),
)

When the type of the template parameter is dict, ensure that the value of at least one key-value pair in the dictionary of the template value contains the value of the ice_token parameter. For example:

# In aime2024_gen_0_shot_chat_prompt.py
cmmlu_infer_cfg = dict(
    # ......
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>', # Same as the value of ice_token
            round=[
                dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
            ],
        ),
        ice_token='</E>',
    ),
    # ......
)

ICLR-PARAM-002

Error Description

The ice_template parameter is not specified when the dataset configuration file needs to construct few-shots based on the training set.

Solution

Take cmmlu_gen_5_shot_cot_chat_prompt.py as an example. This configuration specifies retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), to construct few-shots, so the ice_template parameter must be specified. You can modify it with reference to the following content:

cmmlu_infer_cfg = dict(
    ice_template=dict( # ice_template must be configured
        type=PromptTemplate,
        template=dict(round=[
            dict(
                role='HUMAN',
                prompt=prompt_prefix+QUERY_TEMPLATE,
            ),
            dict(role='BOT', prompt="{answer}\n",)
        ]),
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
            ],
        ),
        ice_token='</E>',
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
    inferencer=dict(type=GenInferencer),
)

ICLR-PARAM-003

Error Description

In the multimodal dataset configuration file, the key value of the prompt_mm parameter in the prompt template is not one of [“text”, “image”, “video”, “audio”].

Solution

Take textvqa_gen_base64.py as an example. In this configuration, the key value of the prompt_mm parameter in the prompt template is one of “text”, “image”, “video”, “audio”. You can modify it with reference to the following content:

textvqa_infer_cfg = dict(
    prompt_template=dict(
        type=MMPromptTemplate,
        template=dict(
            round=[
                dict(role="HUMAN", prompt_mm={ # The key value of the prompt_mm parameter is one of "text", "image", "video", "audio"
                    "text": {"type": "text", "text": "{question} Answer the question using a single word or phrase."},
                    "image": {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image}"}},
                    "video": {"type": "video_url", "video_url": {"url": "data:video/jpeg;base64,{video}"}},
                    "audio": {"type": "audio_url", "audio_url": {"url": "data:audio/wav;base64,{audio}"}},
                })
            ]
            )
    ),
    retriever=dict(type=ZeroRetriever),
    inferencer=dict(type=GenInferencer)
)

ICLR-PARAM-004

Error Description

The id values in fix_id_list for constructing few-shots in the dataset configuration file exceed the range of selectable ids in the training set.

Solution

If the configuration for constructing few-shots in the dataset configuration file is as follows:

retriever=dict(type=FixKRetriever, fix_id_list=[1,2,5,8]),

The detailed error log is Fix-K retriever index 8 is out of range of [0, 8), indicating that the id value 8 in fix_id_list exceeds the range [0, 8) of selectable ids in the training set and needs to be corrected to a value within this range.

ICLR-IMPL-002

Error Description

The ice_token parameter is not configured in the prompt template of the dataset configuration file.

Solution

If both the prompt_template parameter and the ice_template parameter exist, and the log error is ice_token of prompt_template is not provided, then the ice_token parameter must exist in the prompt_template parameter. For example:

cmmlu_infer_cfg = dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(round=[
            dict(
                role='HUMAN',
                prompt=prompt_prefix+QUERY_TEMPLATE,
            ),
            dict(role='BOT', prompt="{answer}\n",)
        ]),
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
            ],
        ),
        ice_token='</E>', # Must be set
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
    inferencer=dict(type=GenInferencer),
)

If only the ice_template parameter exists, and the log error is ice_token of ice_template is not provided, then the ice_token parameter must exist in the ice_template parameter. For example:

ceval_infer_cfg = dict(
    ice_template=dict(
        type=PromptTemplate,
        template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n</E>{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}',
        ice_token='</E>', # Must exist
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
    inferencer=dict(type=GenInferencer),
)

ICLR-IMPL-003

Error Description

Necessary template fields are missing in the dataset configuration file.

Solution

If the error log is Leaving prompt as empty is not supported, it means that at least one of the prompt_template parameter and the ice_template parameter must exist in the dataset configuration file. For example:

cmmlu_infer_cfg = dict( # At least one of ice_template and prompt_template must exist
    ice_template=dict(
        type=PromptTemplate,
        template=dict(round=[
            dict(
                role='HUMAN',
                prompt=prompt_prefix+QUERY_TEMPLATE,
            ),
            dict(role='BOT', prompt="{answer}\n",)
        ]),
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
            ],
        ),
        ice_token='</E>', # Must be set
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
    inferencer=dict(type=GenInferencer),
)

MODEL-IMPL-001

Error Description

When implementing a new class based on the BaseAPIModel class, the parse_text_response method is not implemented, making it impossible to test the inference service through the text interface.

Solution

(For developers) When implementing a subclass based on the BaseAPIModel class, if you want to test the inference service through the text interface, you need to implement the parse_text_response method, which is used to parse the text response returned by the model and convert it into the output format of the model inference service.

MODEL-IMPL-002

Error Description

When implementing a new class based on the BaseAPIModel class, the parse_stream_response method is not implemented, making it impossible to test the inference service through the streaming interface.

Solution

(For developers) When implementing a subclass based on the BaseAPIModel class, if you want to test the inference service through the streaming interface, you need to implement the parse_stream_response method, which is used to parse the streaming response returned by the model and convert it into the output format of the model inference service.

MODEL-PARAM-002

Error Description

In the dataset configuration file, the chat-type prompt template does not contain the role or fallback_role field.

Solution

Refer to the following configuration file content:

cmmlu_infer_cfg = dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(round=[
            dict(
                role='HUMAN', # Contains the 'role' field
                prompt=prompt_prefix+QUERY_TEMPLATE,
            ),
            dict(role='BOT', prompt="{answer}\n",)
        ]),
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
            ],
        ),
        ice_token='</E>',
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
    inferencer=dict(type=GenInferencer),
)

MODEL-PARAM-003

Error Description

In the dataset configuration file, the value of the role parameter in the chat template of prompt engineering is not within the legal range.

Solution

If the chat template-related configuration in the dataset configuration file is as follows:

# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            round=[
                dict(
                    role="HUMAN1",
                    prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
                ),
            ],
        ),
    ),
    # ......
)

The error log is Unknown role HUMAN1 in chat template, legal role chosen from ['HUMAN', 'BOT', 'SYSTEM']., indicating that the value of the role parameter in the chat template is HUMAN1, while the legal role values are HUMAN, BOT, and SYSTEM. Therefore, the value of the role parameter needs to be corrected to one of HUMAN, BOT, or SYSTEM.

MODEL-PARAM-004

Error Description

There is no direct solution available yet.

Solution

If you need to resolve this issue, please raise an issue and include this error code in the issue description.

MODEL-PARAM-005

Error Description

There is no direct solution available yet.

Solution

If you need to resolve this issue, please raise an issue and include this error code in the issue description.

MODEL-TYPE-001

Error Description

In the dataset configuration file, a set of strings is not supported in the prompt engineering template.

Solution

If the prompt template-related configuration in the dataset configuration file is as follows:

# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            round=[ # The list contains multiple strings
                "{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
                "{question}\nPlease reason step by step, and put your final answer within \\boxed{}."
            ],
        ),
    ),
    # ......
)

An error will occur: Mixing str without explicit role is not allowed in API models!. Please modify round to a valid chat template, for example:

round=[
    dict(
        role="HUMAN1",
        prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
    ),
],

MODEL-TYPE-002

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

MODEL-TYPE-003

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

MODEL-TYPE-004

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

MODEL-DATA-001

Error Description

The model task failed to retrieve model name information from the tested inference service.

Solution

If the error message is Failed to get service model path from http://url-to-infer-service. Error: XXXXXX, it indicates a failure to access http://url-to-infer-service/v1/models. You need to check if the tested inference service is running properly and if the /v1/models sub-service is enabled. You can also locate the cause of the access failure to http://url-to-infer-service/v1/models based on the specific error XXXXXX. If the URL http://url-to-infer-service/ does not support the v1/models sub-service, you can configure the model name in the model parameter of the model configuration file. For example:

# In vllm_api_stream_chat.py
models = [
    dict(
        # ......
        model="name_of_model",
        # ......
    )
]

MODEL-DATA-002

Error Description

The dataset configuration file lacks required parameters.

Solution

If the error message is Invalid prompt content: without 'prompt' or 'prompt_mm' param!, it means the dataset configuration file does not contain either the prompt or prompt_mm parameter. You need to add one of these two parameters to the dataset configuration file. For example:

# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            round=[
                dict( # Must contain either the 'prompt' or 'prompt_mm' field
                    role="HUMAN",
                    prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
                ),
            ],
        ),
    ),
    # ......
)

MODEL-DATA-003

Error Description

Failed to parse the returned result of the request in JSON format.

Solution

If the error message is Unexpected response format. Please check 'error_info' in {dataset_abbr}_failed.jsonl for more information., you need to check the specific error information (content of the error_info field) in the {dataset_abbr}_failed.jsonl file under the current inference task’s output path (e.g., outputs/default/20250628_151326/performances/vllm-api-stream-chat/) and further explore solutions.

MODEL-CFG-001

Error Description

The max_seq_len parameter is not configured in the local model configuration file.

Solution

If the error message is max_seq_len is not provided and cannot be inferred from the model config., it means you need to add the max_seq_len parameter to the local model configuration file. For example:

# In hf_chat_model.py
models = [
    dict(
        attr="local",
        # ......
        max_seq_len=2048,
        # ......
    )
]

MODEL-MOD-001

Error Description

Special dependencies required for model execution are not installed.

Solution

If the error message is fastchat module not found. Please install with\npip install "fschat[model_worker,webui]", it indicates that the fastchat dependency is missing. You can install it by executing pip install "fschat[model_worker,webui]".

DSET-CFG-001

Error Description

The dataset configuration file lacks the path field to specify the dataset path.

Solution

If the error message is The 'path' argument is required to load the dataset., it means the dataset configuration file does not contain the path field. You need to add the path field to the dataset configuration file. For example:

# In aime2024_gen_0_shot_chat_prompt.py
aime2024_datasets = [
    dict(
        abbr='aime2024',
        type=Aime2024Dataset,
        path='ais_bench/datasets/aime/aime.jsonl', # Required field to configure the dataset path
        # ......
    )
]

DSET-FILE-001

Error Description

The dataset file does not exist.

Solution

If the error message is Path is not a directory or Parquet file: /path/to/dataset.jsonl, it means /path/to/dataset.jsonl is not a dataset in the required .parquet format. Please confirm that the dataset format meets expectations.
If the error message is No Parquet file found in /path/to/dataset/., it means no .parquet format dataset is found in the path /path/to/dataset/. Please confirm that the dataset format meets expectations.
If the error message is "Dataset file not found: /path/to/dataset/, it means the dataset path /path/to/dataset/ itself does not exist. Please confirm that the dataset path matches the expected input path.
If the error message is Corpus file not found. Please ensure {DEFAULT_CORPUS_FILE} exists in one of: [...] when using the mooncake_trace dataset, the required corpus file was not found. Place assets/shakespeare.txt under ais_bench/third_party/aiperf/assets/shakespeare.txt (relative to the ais_bench package root), or under one of the paths listed in the error message.

DSET-DATA-002

Error Description

The content structure of the dataset is invalid.

Solution

Please check for format issues in the dataset content based on the detailed error message.

DSET-DATA-005

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

DSET-DATA-006

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

DSET-PARAM-002

Error Description

Invalid values for n (number of candidate solutions generated by inference) and k (number of samples collected from candidates).

Solution

If the error log shows:

Maximum value of `k` 4 must be less than or equal to `n` 8

It means k is greater than n. You need to configure k as an integer less than or equal to n. For example:

If both n and k parameters are configured in the dataset configuration file, set their values within the valid range in the configuration file:

# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
    dict(
        abbr='aime2024',
        type=Aime2024Dataset,
        # ......
        k=4,
        n=8,
    )
]

DSET-PARAM-004

Error Description

Invalid parameters in the dataset configuration file.

Solution

Please check for invalid parameter value issues in the dataset configuration file based on the detailed error message. Typical scenarios for mooncake_trace / timestamp-based scheduling:

timestamp: The timestamp field in trace data must be of type float or int and >= 0; otherwise an error is raised with a type or range message.
hash_ids and input_length incompatible: When the error message contains Input length: ..., Hash IDs: ..., Block size: 512 ... Final block size: ... must be > 0 and <= 512, ensure (len(hash_ids)-1)*512+1 <= input_length <= len(hash_ids)*512.
fixed_schedule parameters: When fixed_schedule_end_offset >= 0, fixed_schedule_start_offset must be <= fixed_schedule_end_offset.

DSET-PARAM-005

Error Description

Required parameters are missing during dataset loading or processing.

Solution

Check and supply the missing required parameters according to the detailed error message. For example:

If the error is mean must be provided, when using the mooncake_trace dataset you must provide the mean parameter (via the trace’s input_length field) for prompt generation.
If the error is Either 'input_text' or 'input_length' must be provided, in a single JSONL record of mooncake trace data, you must provide the input_length field when input_text is not provided.

DSET-UNK-001

Error Description

Unknown error of the dataset or a dependent component due to incorrect initialization.

Solution

If the error is “RNG manager not initialized. Call init_rng() first.”, the mooncake_trace prompt generator called derive_rng before init_rng was called. Normal loading via MooncakeTraceDataset.load() calls init_rng(random_seed) automatically; this error usually occurs when calling lower-level APIs or in tests without proper initialization order. Ensure init_rng(seed) is called before any RNG-dependent generation logic.

DSET-DEPENDENCY-002

Error Description

Missing dependencies required for the dataset task evaluation.

Solution

If the error message is:

Please install human_eval use following steps:
git clone git@github.com:open-compass/human-eval.git
cd human-eval && pip install -e .

Execute git clone git@github.com:open-compass/human-eval.git and cd human-eval && pip install -e . in sequence according to the error log content to install the human-eval library.

DSET-MTRC-001

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

DSET-MTRC-003

Error Description

No direct solution is available at this time.

Solution

If you need to resolve this issue, please submit an issue and include this error code in the issue description.

SWEB-DEPENDENCY-001

Error Description

mini-swe-agent dependency is missing when running SWEBench infer, so task initialization fails.

Solution

Install the dependency and retry:

pip install mini-swe-agent

If you use a virtual environment, make sure ais_bench and mini-swe-agent are installed in the same Python environment.

SWEB-DEPENDENCY-002

Error Description

SWE-bench harness dependency is missing when running SWEBench eval.

Solution

Install the harness from the official repository, then retry:

git clone https://github.com/SWE-bench/SWE-bench.git
cd SWE-bench
pip install -e .

SWEB-PARAM-001

Error Description

No valid model config is detected for SWEBench infer (required fields like model/url/api_key are missing or empty).

Solution

Check models[0] in your task config and provide at least:

model, for example hosted_vllm/qwen3
url, for example http://127.0.0.1:2998/v1
api_key, EMPTY is acceptable for local tests

SWEB-PARAM-002

Error Description

Invalid SWEBench dataset name that is not in the supported name set.

Solution

Set dataset name to one of: full, verified, lite, multilingual.

SWEB-DATA-001

Error Description

Prediction input contains instance_id values that do not exist in the current dataset.

Solution

Ensure the prediction file and eval dataset are fully aligned:

Run infer and eval with the same dataset config.
Check whether instance_id entries in predictions were manually changed or mixed from another run.

SWEB-DATA-002

Error Description

Failed to load SWEBench dataset from Hugging Face online source.

Solution

Check network connectivity and Hugging Face access first. If your environment is restricted, download parquet files manually and configure a local path.

SWEB-DATA-003

Error Description

Failed to read or parse local SWEBench parquet files.

Solution

Validate local data integrity and format:

Confirm files are valid parquet files.
Confirm file naming matches the target split (for example test-*.parquet).
Re-download or re-export corrupted files and retry.

SWEB-FILE-001

Error Description

Prediction file is missing during SWEBench eval (*.json or preds.json not found).

Solution

Run infer successfully before eval, and confirm prediction files exist under work_dir/predictions for the target model.

SWEB-FILE-002

Error Description

Local SWEBench dataset path resolution failed (path does not exist or is not accessible).

Solution

Check whether path in config is correct, and verify the current user has read permission for that directory/file.

SWEB-FILE-003

Error Description

No parquet file for the target split is found under the local dataset path.

Solution

Ensure one of the following exists:

<root>/data/<split>-*.parquet
<root>/<split>-*.parquet For single-file use cases, path can point directly to that parquet file.

SWEB-RUNTIME-001

Error Description

Required Docker image for SWEBench is unavailable locally and pulling failed.

Solution

Check Docker daemon status and network, then run docker pull for the image shown in logs. Retry after image is available.

SWEB-RUNTIME-002

Error Description

Runtime error occurs during SWEBench execution (for example harness execution failure or future task exception).

Solution

Use detailed logs to triage:

Check dependency installation, Docker availability, and prediction file format first.
If it persists, keep full logs and open an issue with the error code: https://github.com/AISBench/benchmark/issues.

SWEBP-DEPENDENCY-001

Error Description

mini-swe-agent dependency is missing when running SWE-Bench Pro infer, so task initialization fails.

Solution

Install the adapted dependency from scaleapi’s repository and retry:

git clone https://github.com/scaleapi/mini-swe-agent.git

cd mini-swe-agent/

pip install -e .

If you use a virtual environment, make sure ais_bench and mini-swe-agent are installed in the same Python environment.

SWEBP-PARAM-001

Error Description

No valid model config is detected for SWE-Bench Pro infer (required fields like model/url/api_key are missing or empty).

Solution

Check models[0] in your task config and provide at least:

model, for example qwen3
url, for example http://127.0.0.1:2998/v1
api_key, EMPTY is acceptable for local tests

SWEBP-PARAM-002

Error Description

Invalid SWE-Bench Pro dataset name that is not in the supported name set.

Solution

Set dataset name to one of: full, mini.

SWEBP-DATA-001

Error Description

Failed to load SWE-Bench Pro dataset from Hugging Face online source, or local file not found.

Solution

Check network connectivity and Hugging Face access first. If your environment is restricted, download data manually and configure a local path. The local path must be a valid parquet file path.

SWEBP-FILE-001

Error Description

Prediction file is missing during SWE-Bench Pro eval (*.json or preds.json not found).

Solution

Run infer successfully before eval, and confirm prediction files exist under work_dir/predictions for the target model.

SWEBP-RUNTIME-001

Error Description

Required Docker image for SWE-Bench Pro is unavailable locally and pulling failed.

Solution

Check Docker daemon status and network, then run docker pull for the image shown in logs. Retry after image is available.

SWEBP-RUNTIME-002

Error Description

Runtime error occurs during SWE-Bench Pro execution (for example harness execution failure or future task exception).

Solution

Use detailed logs to triage:

Check dependency installation, Docker availability, and prediction file format first.
If it persists, keep full logs and open an issue with the error code: https://github.com/AISBench/benchmark/issues.