Error Code Descriptionο
TMAN-CMD-001ο
Error Descriptionο
This error indicates that a required input parameter is missing when executing a command. When launching the ais_bench evaluation tool via the command line, you must specify the model configuration and dataset configuration.
Examples of valid scenarios:
# When using an open-source dataset, you must specify the model task via `--models` and the dataset task via `--datasets`
ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen
# When using a custom dataset, you must specify the model task via `--models` and the custom dataset path via `--custom_dataset_path`
ais_bench --models vllm_api_stream_chat --custom_dataset_path /path/to/custom/dataset
Solutionο
Refer to the examples of valid scenarios to supplement the missing parameters.
TMAN-CMD-002ο
Error Descriptionο
This error indicates that the value of a command-line parameter is not within the valid range.
Solutionο
Search this document for the specific command line that appears in the log, and find the constraints on parameter values specified in the command line description.
For example, if this error occurs when executing ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen --num-prompts -1 --mode perf, search for --num-prompts in the document to find the constraints in the parameter description.
Parameter |
Description |
Example |
|---|---|---|
|
Specifies the number of test cases to evaluate in the dataset. A positive integer must be entered. If the value exceeds the number of dataset cases or is not specified, the entire dataset will be evaluated. |
|
The parameter description specifies that the value must be a positive integer (greater than 0).
TMAN-CFG-001ο
Error Descriptionο
There is a syntax error in the .py configuration file, causing parsing failure.
Solutionο
Check the Python syntax errors in the configuration file printed in the log (all configurable files for the ais_bench evaluation tool follow Python syntax), such as missing quotation marks or mismatched parentheses, and correct them.
TMAN-CFG-002ο
Error Descriptionο
A required parameter is missing from the .py configuration file, causing parsing failure.
For example, the specific error log is: Config file /path/to/vllm_api_stream_chat.py does not contain 'models' param!, which indicates that the models parameter is missing from the configuration file.
A valid vllm_api_stream_chat.py file contains the models parameter:
# ......
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-stream-chat",
# ......
)
]
Solutionο
In the .py configuration file printed in the error log, add the parameter that the log indicates is missing.
TMAN-CFG-003ο
Error Descriptionο
A parameter in the .py configuration file has an incorrect type, causing parsing failure.
For example, the relevant configuration in the vllm_api_stream_chat.py configuration file is:
# ......
models = dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-stream-chat",
# ......
)
The specific error log is: In config file /path/to/vllm_api_stream_chat.py, 'models' param must be a list!, which indicates that the models parameter in the configuration file has an incorrect type. It should be a list type (but is actually a dictionary type).
Solutionο
In the .py configuration file printed in the error log, correct the incorrect parameter type to the required type as indicated by the log.
UTILS-MATCH-001ο
Error Descriptionο
The task name specified via --models, --datasets, or --summarizer cannot be matched to a .py configuration file with the same name as the task.
Solutionο
Check the task name that the log indicates cannot be matched. For example, if xxxx cannot be matched, the following log will be printed:
+------------------------+
| Not matched patterns |
|------------------------|
| xxxx |
+------------------------+
Scenario 1: The configuration file folder path is not specifiedο
First, execute pip3 show ais_bench_benchmark | grep "Location:" to check the installation path of the ais_bench evaluation tool. For example, the following information is obtained after execution:
Location: /usr/local/lib/python3.10/dist-packages
The configuration file path is then /usr/local/lib/python3.10/dist-packages/ais_bench/benchmark/configs. Navigate to this path and perform the following checks:
If the unmatchable task name is specified via
--models, check whether there is a .py configuration file with the same name as the task in themodels/path (including subdirectories).If the unmatchable task name is specified via
--datasets, check whether there is a .py configuration file with the same name as the task in thedatasets/path (including subdirectories).If the unmatchable task name is specified via
--summarizer, check whether there is a .py configuration file with the same name as the task in thesummarizers/path (including subdirectories).
Scenario 2: The configuration file folder path is specifiedο
If you specified the configuration file folder path via --config-dir when executing the command, navigate to this path and perform the following checks:
If the unmatchable task name is specified via
--models, check whether there is a .py configuration file with the same name as the task in themodels/path (including subdirectories).If the unmatchable task name is specified via
--datasets, check whether there is a .py configuration file with the same name as the task in thedatasets/path (including subdirectories).If the unmatchable task name is specified via
--summarizer, check whether there is a .py configuration file with the same name as the task in thesummarizers/path (including subdirectories).
UTILS-CFG-001ο
Error Descriptionο
When using the randomly synthesized dataset in the tokenid scenario, the model configuration file must specify the tokenizer path.
Solutionο
Assume the ais_bench evaluation tool command is ais_bench --models vllm_api_stream_chat --datasets synthetic_gen_tokenid --mode perf. Then, all path parameters in the models section of the vllm_api_stream_chat.py configuration file (refer to [Modifying Configuration Files for Corresponding Tasks](β¦/get_started/quick_start.md#Modifying Configuration Files for Corresponding Tasks) for the configuration file path retrieval method) must be set to the tokenizer path (usually the model weight folder path).
# ......
models = dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-stream-chat",
path="/path/to/tokenizer", # Enter the tokenizer path
# ......
)
UTILS-CFG-002ο
Error Descriptionο
Initializing a model instance using parameters in the model configuration file failed due to invalid parameter content.
Solutionο
Check the log for build failed with the following errors:{error_content}, and correct the parameters in the model configuration file according to the prompts in error_content.
For example, if the batch_size parameter value in the model configuration file is 100001, and error_content is "batch_size must be an integer in the range (0, 100000]", this indicates that the batch_size parameter exceeds the valid range (0, 100000]. You need to correct the batch_size parameter value to 100000.
UTILS-CFG-003ο
Error Descriptionο
The value of a parameter in the model configuration file is outside the range limited by the tool.
Solutionο
Configure the parameter value within the range limited by the tool according to the prompts in the detailed log. For example, if the configuration file content is:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service1",
# ......
)
]
The detailed error log is:
Model config contain illegal attr, 'attr' in model config is 'service1', only 'local' and 'service' are supported!
This indicates that the value of the attr parameter in the model configuration is 'service1', but the tool only supports the values 'local' and 'service'. You need to set the attr parameter to one of the valid values.
UTILS-CFG-004ο
Error Descriptionο
Some configuration items for model parameters must be consistent across all model configurations and cannot have different values.
Solutionο
Unify the configuration values according to the prompts in the detailed log. For example, if the configuration file content is:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
),
dict(
attr="local"
)
]
The detailed error log is:
Cannot run local and service model together! Please check 'attr' parameter of models
Because the models configuration contains two parameter values: 'service' and 'local', but the tool only supports a unified configuration of one value. Therefore, you need to set the attr parameter in the models configuration to either 'service' or 'local'.
UTILS-CFG-008ο
Error Descriptionο
The loaded multimodal dataset contains invalid content.
Solutionο
If the error log is
Invalid dataset: /path/to/non-mm-dataset , please check whether the dataset is a MM-style dataset!, it means the specified dataset/path/to/non-mm-datasetis not a valid multimodal dataset. Each piece of data in a valid dataset must contain at least atypeorpathfield. If it contains atypefield, the value of thetypefield must be one of["image", "video", "audio"].If the error log is
Param 'mm_type' does not match the data type of dataset: /path/to/mm-dataset , please check it!, it means the specified dataset/path/to/mm-datasetis a valid multimodal dataset, but the value of themm_typefield in the prompt engineering configuration of the dataset configuration file is invalid. The valid values for themm_typefield must be one of["image", "video", "audio"].
UTILS-DEPENDENCY-001ο
Error Descriptionο
A required dependency module is missing during execution.
Solutionο
If the detailed error log is Failed to import required modules. Please install the necessary packages: pip install math_verify, follow the guidance in the detailed log and execute pip install math_verify to install the dependent library.
UTILS-TYPE-001ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-002ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-003ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-004ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-005ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-006ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-007ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-TYPE-008ο
Error Descriptionο
The command-line parameter value is too large.
Solutionο
If the error log shows '--max-num-workers' must be <= 8, but got 9 ......, it indicates that the value of the command-line parameter --max-num-workers is 9. However, the tool only supports a maximum of 8 concurrent workers. Therefore, you need to adjust the value of --max-num-workers to be β€ 8.
UTILS-TYPE-009ο
Error Descriptionο
The command-line parameter value is not an integer type.
Solutionο
If the error log shows '--max-num-workers' must be an integer, but got '9' ......, it indicates that the value of the command-line parameter --max-num-workers is the string β9β. However, the tool only supports integer-type values. Therefore, you need to correct the value of --max-num-workers to an integer type.
UTILS-TYPE-010ο
Error Descriptionο
The command-line parameter value is too small.
Solutionο
If the error log shows '--max-num-workers' must be >= 1, but got 0 ......, it indicates that the value of the command-line parameter --max-num-workers is 0. However, the tool only supports at least 1 concurrent worker. Therefore, you need to adjust the value of --max-num-workers to be β₯ 1.
UTILS-PARAM-001ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-PARAM-002ο
Error Descriptionο
In the custom dataset scenario, the request_count parameter in the configuration file *.meta.json is outside the valid range.
Solutionο
If the error message is Please make sure that the value of parameter 'request_count' can be converted to int(greater than 0)., it means the request_count parameter in *.meta.json needs to be set to > 0.
UTILS-PARAM-003ο
Error Descriptionο
In the custom dataset scenario, the min_value parameter is greater than the max_value parameter in the configuration file *.meta.json.
Solutionο
If the error message is When the uniform distribution is set, parameter 'min_value' must be less than or equal to parameter 'max_value'., it means the min_value parameter in *.meta.json needs to be set to β€ the max_value parameter. You need to correct it to min_value β€ max_value.
UTILS-PARAM-004ο
Error Descriptionο
In the custom dataset scenario, the min_value and max_value parameters in the configuration file *.meta.json are outside the valid range.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
UTILS-PARAM-005ο
Error Descriptionο
In the custom dataset scenario, the configuration file *.meta.json lacks required parameters.
Solutionο
For example, if the error message is When the uniform distribution is set, parameter 'min_value' and 'max_value' must be provided., it means that in the uniform distribution scenario, both the min_value and max_value parameters need to be set in *.meta.json.
UTILS-PARAM-006ο
Error Descriptionο
In the custom dataset scenario, the percentage_distribute parameter in the configuration file *.meta.json is invalid.
Solutionο
The valid value range for the percentage_distribute parameter is described in the detailed log as follows:
Ensure the configuration data follows the format [max_tokens, percentage], where:
- 'max_tokens' must be a positive number (greater than 0).
- 'percentage' must be a float between 0 and 1 (greater than 0 and inclusive 1).
- The sum of all 'percentage' values must equal exactly 1.
Example valid format: [[1000, 0.5],[500,0.5]] or [[2000, 1.0]]
Example invalid formats: [[0, 0.5]] (max_tokens <= 0), [[1000, 1.5]] (percentage > 1), [[1000, 0.3], [500,0.2]] (sum not 1)
UTILS-PARAM-007ο
Error Descriptionο
In the custom dataset scenario, the value of the method parameter (which defines the data distribution method) in the configuration file *.meta.json is outside the valid range.
Solutionο
If the error message is Type of data distribution(method): uniform1 not supported, legal methods chosen from ['uniform', 'percentage']., it means the value uniform1 of the method parameter in *.meta.json is outside the valid range. You need to correct the method parameter value to either uniform or percentage.
UTILS-PARAM-008ο
Error Descriptionο
In the custom dataset scenario, the configuration file *.meta.json contains invalid fields.
Solutionο
If the specific error message is There are illegal keys: xxxxxx,yyyyyy, it means the *.meta.json file contains the two invalid fields xxxxxx and yyyyyy. You need to delete these two fields from *.meta.json.
UTILS-FILE-002ο
Error Descriptionο
The tokenizer path specified by the path parameter in the model configuration file does not exist.
Solutionο
If the content of the model configuration file is as follows:
# In vllm_stream_api_chat.py
models = [
dict(
# ......
path="/path/to/invalid",
# ......
),
]
And the specific error log is Tokenizer path '/path/to/invalid' does not exist, it indicates that the tokenizer path /path/to/invalid specified by the path parameter in the model configuration file does not exist (an empty path is also considered non-existent). You need to correct it to an existing tokenizer path.
UTILS-FILE-003ο
Error Descriptionο
Failed to load the tokenizer file.
Solutionο
If the error message is Failed to load tokenizer from /path/to/tokenizer: ExceptionName: XXXXXX, first confirm whether the tokenizer file under the path /path/to/tokenizer is compatible with the transformers version of the current runtime environment. If compatible, perform further troubleshooting based on the specific error information represented by XXXXXX.
UTILS-FILE-004ο
Error Descriptionο
No direct solution is available yet.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
PARTI-FILE-001ο
Error Descriptionο
Insufficient permissions for the output path file; the tool cannot write results to it.
Solutionο
For example, if the error log is:
Current user can't modify /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json, reuse will not enable.
Execute ls -l /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json to check the owner and permissions of this path. If the current user does not have write permission for the file, you need to add write permission for the current user (for example, execute chmod u+w /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json to add write permission for the current user).
CALC-MTRC-001ο
Error Descriptionο
The performance result data is invalid, and metrics cannot be calculated.
Solutionο
Scenario 1: The original performance result data is emptyο
If you specified recalculation of performance results via --mode perf_viz when executing the command, and the base output path is outputs/default/20250628_151326 (find Current exp folder: in the console output), check whether all *_details.jsonl files in the performances/ folder under this path are empty. If they are empty, you need to run the evaluation once first to generate performance result data.
Scenario 2: The original performance result data contains no valid valuesο
If you specified recalculation of performance results via --mode perf_viz when executing the command, and the base output path is outputs/default/20250628_151326 (find Current exp folder: in the console output), check whether the *_details.jsonl files in the performances/ folder under this path contain no valid fields (they may have been tampered with). If so, you need to re-run the performance evaluation to generate new data.
CALC-FILE-001ο
Error Descriptionο
Failed to save performance result data to disk.
Solutionο
If the detailed error log is:
Failed to write request level performance metrics to csv file '{/path/to/workspace/outputs/default/20250628_151326/performances/vllm-api-stream-chat/gsm8k.csv': XXXXXX
Where XXXXXX is the specific reason for the disk-saving failure. For example, Permission denied means the file already exists and the current user does not have write permission. You can either delete the file or add write permission for the current user to the existing file.
CALC-DATA-001ο
Error Descriptionο
No valid performance metric data was obtained for all completed inference requests, and metrics cannot be calculated.
Solutionο
If the specific log is:
All requests failed, cannot calculate performance results. Please check the error logs from responses!
This indicates that all requests during the inference process failed. You need to further check the logs of failed requests to identify the cause of the failure.
If the command includes
--debug, the logs of failed requests will be printed directly to the console, and you can view them in the console records.If the command does not include
--debug, the console records will contain logs similar to[ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out. You can view the specific cause of the request failure inoutputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out.
CALC-DATA-002ο
Error Descriptionο
When calculating steady-state performance metrics, no requests belonging to the steady state were found among all request information, and steady-state metrics cannot be calculated.
Solutionο
You can check the concurrency graph of inference requests (reference document: https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to confirm whether the Request Concurrency Count in the concurrency step graph reaches the concurrency number set in the model configuration file (the batch_size parameter) and at least two requests reach the maximum concurrency number.
If the above conditions are not met, you can try the following methods to achieve a steady state:
Scenario A: Request Concurrency Count in the concurrency step graph increases continuously and then decreases continuouslyο
Reduce the concurrency number of inference requests (the
batch_sizeparameter in the model configuration file).Increase the total number of inference requests.
Scenario B: Request Concurrency Count in the concurrency step graph increases continuously, fluctuates for a period of time, and then decreases continuouslyο
Reduce the concurrency number of inference requests (the
batch_sizeparameter in the model configuration file).Increase the frequency of sending inference requests (the
request_rateparameter in the model configuration file).
SUMM-TYPE-001ο
Error Descriptionο
The abbr parameter configurations of all dataset tasks are mixed (i.e., use different types).
Solutionο
For example, if the error log is:
mixed dataset_abbr type is not supported, dataset_abbr type only support (list, tuple) or str.
This indicates that in the datasets configuration, the abbr parameter configurations of all dataset tasks use different types (e.g., list and str). You need to unify the abbr parameter configurations of all dataset tasks to use the same type (e.g., list or str).
SUMM-FILE-001ο
Error Descriptionο
There are no performance data files (*_details.jsonl) in the output working path.
Solutionο
Confirm whether you incorrectly specified recalculation of performance results via
--mode perf_vizwhen executing the evaluation. If you want to run a complete performance test, specify--mode perf.Confirm whether the base output path is correct (e.g.,
outputs/default/20250628_151326; findCurrent exp folder:in the console output).Confirm whether there are
*_details.jsonlfiles in theperformances/folder under this path. If not, check other error information in the previous console logs to confirm whether other errors caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs.
SUMM-MTRC-001ο
Error Descriptionο
The number of valid fields is inconsistent across requests in the detailed performance data.
Solutionο
Check whether the number of valid fields is consistent across all requests in the *_details.jsonl files under the base output path (e.g., outputs/default/20250628_151326; find Current exp folder: in the console output). If inconsistent, check whether there are other errors in the historical console logs that caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs.
RUNNER-TASK-001ο
Error Descriptionο
The evaluation task failed to execute.
Solutionο
For example, if the specific error message is [ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out, please view the specific error information in outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out to identify the cause of the failure.
TINFER-PARAM-001ο
Error Descriptionο
The maximum concurrency value batch_size in the model configuration file is outside the valid range.
Solutionο
If the error log shows Concurrency must be greater than 0 and <= 100000, but got -1, it means the maximum concurrency of the model is configured as -1. You need to set the batch_size parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000.
Example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
batch_size=100,
# ......
),
]
TINFER-PARAM-002ο
Error Descriptionο
The num_return_sequences parameter (number of returned sequences) of the generation_kwargs parameter in the model configuration file is outside the valid range.
Solutionο
If the error log shows num_return sequences must be a positive integer, but got {0}, it means the number of returned sequences of the model is configured as 0. You need to set the num_return_sequences parameter in the model configuration file to an integer greater than 0.
Example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
generation_kwargs=dict(
num_return_sequences=1,
),
# ......
),
]
TINFER-PARAM-004ο
Error Descriptionο
The ramp_up_strategy parameter (ramp-up strategy) of the traffic_cfg parameter in the model configuration file is outside the valid range.
Solutionο
If the error log shows Invalid ramp_up_strategy: {constant} only support 'linear' and 'exponential', it means the request sending strategy of the model is configured to a value not in ['exponential', 'linear']. You need to set the ramp_up_strategy parameter in the model configuration file to 'exponential' or 'linear'.
Example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
traffic_cfg=dict(
ramp_up_strategy="linear",
),
# ......
),
]
TINFER-PARAM-005ο
Error Descriptionο
Excessively high virtual memory usage when the tool runs inference.
Solutionο
If the specific error log is:
Virtual memory usage too high: 90% > 80% (Total memory: 50 GB "Used: 45 GB, Available: 5 GB, Dataset needed memory size: 3000 MB)
It indicates that the current system memory is 50GB, with 45GB used and 5GB available, while the dataset requires 3000MB of memory, thus triggering this error. Solutions are divided into two cases:
If the total system memory is insufficient, increase the system memory.
If the total system memory is sufficient but the memory required by the dataset is greater than the available memory, clear the occupied memory or cache on the current server.
TINFER-PARAM-006ο
Error Descriptionο
No timestamps found in datasets, but use_timestamp is True! Make sure your dataset contains timestamp field or set use_timestamp to False in model config.
Solutionο
If the error log is No timestamps found in datasets, but use_timestampis True! Make sure your dataset containstimestampfield or setuse_timestamp to False in model config., it means the dataset configuration file does not contain the timestamp field, but the use_timestamp parameter in the model configuration file is True. You need to set the use_timestamp parameter in the model configuration file to False.
Example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
use_timestamp=False,
# ......
),
]
TINFER-IMPL-001ο
Error Descriptionο
When executing a service-oriented inference task, a process fails to start while multiple processes are launched within the inference task.
Solutionο
If the error log is:
Failed to start worker x: XXXXXX, total workers to launch: 4
Where x is the ID of the failed process, XXXXXX is the specific reason for the failure, and 4 is the total number of processes.
Solutions:
If the number of occurrences of this error log is equal to the total number of processes, it means all processes failed to start. Check the specific failure reason, take corresponding measures, and retry.
If the number of occurrences of this error log is less than the total number of processes, it means some processes failed to start. Partial process startup failures do not affect the execution of the evaluation task but will impact the actual maximum concurrency
batch_size. Decide whether to manually interrupt to locate the specific failure reason based on actual circumstances.
TINFER-RUNTIME-001ο
Error Descriptionο
All requests fail during the warm-up phase when evaluating inference serviceization.
Solutionο
If the error log is Exit task because all warmup requests failed, failed reasons: XXXXXX, locate the problem based on the specific failure reason XXXXXX (error information from the service), take corresponding measures, and retry.
TEVAL-PARAM-001ο
Error Descriptionο
Invalid values for the number of candidate solutions generated by inference n and the number of samples collected from them k.
Solutionο
If the error log is:
k and n must be greater than 0 and k <= n, but got k: 16, n: 8
It means k is greater than n. You need to configure k to an integer less than or equal to n.
Examples:
If both
nandkparameters are configured in the dataset configuration file, set their values to the valid range in the configuration file:
# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
# ......
k=4,
n=8,
)
]
If the
nparameter is not configured in the dataset configuration file, the value of thenum_return_sequencesparameter in the model configuration file will be used as the value ofn. You need to configurekin the dataset configuration file to an integer less than or equal tonum_return_sequencesin the model configuration file.
# In vllm_stream_api_chat.py, the num_return_sequences parameter corresponds to `n`
models = [
dict(
attr="service",
# ......
generation_kwargs=dict(
num_return_sequences=8,
),
# ......
),
]
# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
# ......
k=4,
)
]
ICLI-PARAM-001ο
Error Descriptionο
The type of the retriever parameter for constructing prompt engineering in the dataset configuration file is not a subclass of BaseRetriever or a list of subclasses of BaseRetriever.
Solutionο
If you want to use a custom retriever class
CustomedRetriever, ensure thatCustomedRetrieveris a subclass ofBaseRetriever.If you want to use multiple custom retriever classes
CustomedRetriever1, CustomedRetriever2, configure theretrieverparameter in the dataset configuration file as[CustomedRetriever1, CustomedRetriever2], and each class in the list must inherit fromBaseRetriever.
ICLI-PARAM-002ο
Error Descriptionο
The value of the infer_mode parameter in the inferencer configuration in the multi-turn dialogue dataset configuration file is outside the valid range.
Solutionο
Taking the mtbench configuration file as an example, if the configuration of mtbench_gen.py is as follows:
mtbench_infer_cfg = dict(
# ......
inferencer=dict(type=MultiTurnGenInferencer, infer_mode="every1")
)
The log error is:
Multiturn dialogue infer model only supports everyγlast or every_with_gt, but got every1
The correct configuration should set the infer_mode parameter to one of every, last, or every_with_gt.
ICLI-PARAM-003ο
Error Descriptionο
When specifying --mode perf --pressure in the command line for performance stress testing, the batch_size parameter is not specified in the model configuration file.
Solutionο
Taking the vllm_stream_api_chat.py configuration file as an example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
batch_size=16,
# ......
),
]
ICLI-PARAM-004ο
Error Descriptionο
The maximum concurrency value batch_size in the model configuration file is outside the valid range.
Solutionο
If the error log shows The range of batch_size is [1, 100000], but got -1. Please set it in datasets config, it means the maximum concurrency of the model is configured as -1. You need to set the batch_size parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000.
Example:
# In vllm_stream_api_chat.py
models = [
dict(
attr="service",
# ......
batch_size=100,
# ......
),
]
ICLI-PARAM-006ο
Error Descriptionο
PPL-type datasets do not support performance testing.
Solutionο
Check the used dataset configuration file, for example:
# In ARC_c_ppl_0_shot_str.py
ARC_c_infer_cfg = dict(
# ......
inferencer=dict(type=PPLInferencer))
The type of inferencer is PPLInferencer. Such dataset configuration files do not support performance testing, so you need to replace them with other dataset configuration files or specify --mode all to execute accuracy evaluation.
ICLI-PARAM-007ο
Error Descriptionο
PPL-type datasets do not support inference using streaming model configurations.
Solutionο
Check the used dataset configuration file, for example:
# In ARC_c_ppl_0_shot_str.py
ARC_c_infer_cfg = dict(
# ......
inferencer=dict(type=PPLInferencer))
The type of inferencer is PPLInferencer. Such dataset configuration files do not support inference using streaming model configurations, so you need to replace them with other dataset configuration files, or specify a non-streaming model configuration file via --models, such as --models vllm_api_general_chat.
ICLI-IMPL-004ο
Error Descriptionο
BFCL datasets do not support performance testing.
Solutionο
If you want to use the BFCL dataset task for accuracy testing but mistakenly specify
--mode perfin the command line (which triggers performance testing), change the command line to--mode allto specify accuracy testing.If you want to use the BFCL dataset task for performance testing, it is not supported currently.
ICLI-IMPL-006ο
Error Descriptionο
Model tasks with streaming interfaces do not support accuracy evaluation using BFCL datasets.
Solutionο
Refer to Model Configuration Instructions and select model tasks with text interfaces (e.g., vllm_api_general_chat) for inference.
ICLI-IMPL-008ο
Error Descriptionο
The model backend corresponding to the current model configuration file has not implemented the methods required for PPL inference.
Solutionο
Refer to the documentation (not yet available) to check which model configurations support PPL inference, such as vllm_api_general_chat.
ICLI-IMPL-010ο
Error Descriptionο
No token IDs in the result of a PPL inference, leading to failure in loss calculation.
Solutionο
Verify whether the tested inference object (inference service) supports PPL inference and can normally return valid prompt_logprobs required for PPL inference.
ICLI-RUNTIME-001ο
Error Descriptionο
Failed to obtain inference results when accessing the inference service during warm-up.
Solutionο
If the log shows Get result from cache queue failed: XXXXXX (where XXXXXX is the specific reason for the failure to obtain inference results), take corresponding measures based on the specific reason (e.g., if it is a timeout-related exception, confirm whether the timeout setting of the inference service is reasonable or check if the current configuration can access the inference service normally).
ICLI-FILE-001ο
Error Descriptionο
Failed to write inference result files to disk.
Solutionο
If the log shows
Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.json: XXXXXX, it means the inference results failed to be written to disk in the accuracy scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated byXXXXXX(e.g., permission issues, insufficient disk space, etc.).If the log shows
Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.jsonl: XXXXXX, it means the inference results failed to be written to disk in the performance scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated byXXXXXX(e.g., permission issues, insufficient disk space, etc.).
ICLI-FILE-002ο
Error Descriptionο
Failed to save numpy-format data (e.g., ITL data for each request) to the database.
Solutionο
If the log shows Failed to save numpy array to database: XXXXXX, it means the numpy-format data failed to be saved to the database. Troubleshoot and resolve the issue based on the specific saving reason indicated by XXXXXX (e.g., database connection issues, non-existent database tables, etc.).
ICLE-DATA-002ο
Error Descriptionο
The configured number of candidate solutions generated by inference n is inconsistent with the actual number of returned candidate solutions.
Solutionο
If
--mode allis specified in the command line or--modeis not specified (indicating execution of infer + evaluate), triggering this exception means there is a bug in the tool itself. You can provide feedback in the issue.If
--mode evalis specified in the command line (evaluation based on previous inference results), and the exception error is:Replication length mismatch, len of replications: 4 != n: 8, then set the parameternin the configuration file corresponding to the dataset task to the number of replications4:
# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
# ......
n=4,
)
]
ICLR-TYPE-001ο
Error Descriptionο
In the dataset configuration file, the type of the prompt template is incorrect. Only str or dict types are supported currently.
Solutionο
Ensure that the type of the prompt template in the inference configuration of the dataset configuration file is str or dict, for example:
# In aime2024_gen_0_shot_str.py
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template='{question}\nPlease reason step by step, and put your final answer within \\boxed{}.' # str type
),
# ......
)
# In aime2024_gen_0_shot_chat_prompt.py
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict( # dict type
round=[
dict(
role="HUMAN",
prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
),
],
),
),
# ......
)
If the type of the value of the template parameter is incorrect, correct it to str or dict type.
ICLR-TYPE-002ο
Error Descriptionο
In the dataset configuration file, when the type of the prompt template is dict, the value type of all key-value pairs in it is incorrect. Currently, the supported value types are only str, list, and dict.
Solutionο
Ensure that in the dataset configuration file, the value type of all key-value pairs in the prompt template under the inference configuration is str, list, or dict. For example:
# In aime2024_gen_0_shot_chat_prompt.py
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict( # dict type
round=[
dict(
role="HUMAN", # str type
prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", # str type
),
],
),
),
# ......
)
ICLR-PARAM-001ο
Error Descriptionο
In the dataset configuration file, when the ice_token parameter is configured in the prompt template, the value of the template parameter does not contain the value of the ice_token parameter.
Solutionο
When the type of the
templateparameter isstr, ensure that the string value oftemplatecontains the value of theice_tokenparameter. For example:
# In ceval_gen_5_shot_str.py
ceval_infer_cfg = dict(
ice_template=dict(
type=PromptTemplate,
template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n</E>{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}', # The string contains '</E>', the value of ice_token
ice_token='</E>',
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
inferencer=dict(type=GenInferencer),
)
When the type of the
templateparameter isdict, ensure that the value of at least one key-value pair in the dictionary of thetemplatevalue contains the value of theice_tokenparameter. For example:
# In aime2024_gen_0_shot_chat_prompt.py
cmmlu_infer_cfg = dict(
# ......
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>', # Same as the value of ice_token
round=[
dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
],
),
ice_token='</E>',
),
# ......
)
ICLR-PARAM-002ο
Error Descriptionο
The ice_template parameter is not specified when the dataset configuration file needs to construct few-shots based on the training set.
Solutionο
Take cmmlu_gen_5_shot_cot_chat_prompt.py as an example. This configuration specifies retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), to construct few-shots, so the ice_template parameter must be specified. You can modify it with reference to the following content:
cmmlu_infer_cfg = dict(
ice_template=dict( # ice_template must be configured
type=PromptTemplate,
template=dict(round=[
dict(
role='HUMAN',
prompt=prompt_prefix+QUERY_TEMPLATE,
),
dict(role='BOT', prompt="{answer}\n",)
]),
),
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
],
),
ice_token='</E>',
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
inferencer=dict(type=GenInferencer),
)
ICLR-PARAM-003ο
Error Descriptionο
In the multimodal dataset configuration file, the key value of the prompt_mm parameter in the prompt template is not one of [βtextβ, βimageβ, βvideoβ, βaudioβ].
Solutionο
Take textvqa_gen_base64.py as an example. In this configuration, the key value of the prompt_mm parameter in the prompt template is one of βtextβ, βimageβ, βvideoβ, βaudioβ. You can modify it with reference to the following content:
textvqa_infer_cfg = dict(
prompt_template=dict(
type=MMPromptTemplate,
template=dict(
round=[
dict(role="HUMAN", prompt_mm={ # The key value of the prompt_mm parameter is one of "text", "image", "video", "audio"
"text": {"type": "text", "text": "{question} Answer the question using a single word or phrase."},
"image": {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image}"}},
"video": {"type": "video_url", "video_url": {"url": "data:video/jpeg;base64,{video}"}},
"audio": {"type": "audio_url", "audio_url": {"url": "data:audio/wav;base64,{audio}"}},
})
]
)
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer)
)
ICLR-PARAM-004ο
Error Descriptionο
The id values in fix_id_list for constructing few-shots in the dataset configuration file exceed the range of selectable ids in the training set.
Solutionο
If the configuration for constructing few-shots in the dataset configuration file is as follows:
retriever=dict(type=FixKRetriever, fix_id_list=[1,2,5,8]),
The detailed error log is Fix-K retriever index 8 is out of range of [0, 8), indicating that the id value 8 in fix_id_list exceeds the range [0, 8) of selectable ids in the training set and needs to be corrected to a value within this range.
ICLR-IMPL-002ο
Error Descriptionο
The ice_token parameter is not configured in the prompt template of the dataset configuration file.
Solutionο
If both the
prompt_templateparameter and theice_templateparameter exist, and the log error isice_token of prompt_template is not provided, then theice_tokenparameter must exist in theprompt_templateparameter. For example:
cmmlu_infer_cfg = dict(
ice_template=dict(
type=PromptTemplate,
template=dict(round=[
dict(
role='HUMAN',
prompt=prompt_prefix+QUERY_TEMPLATE,
),
dict(role='BOT', prompt="{answer}\n",)
]),
),
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
],
),
ice_token='</E>', # Must be set
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
inferencer=dict(type=GenInferencer),
)
If only the
ice_templateparameter exists, and the log error isice_token of ice_template is not provided, then theice_tokenparameter must exist in theice_templateparameter. For example:
ceval_infer_cfg = dict(
ice_template=dict(
type=PromptTemplate,
template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n</E>{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}',
ice_token='</E>', # Must exist
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
inferencer=dict(type=GenInferencer),
)
ICLR-IMPL-003ο
Error Descriptionο
Necessary template fields are missing in the dataset configuration file.
Solutionο
If the error log is Leaving prompt as empty is not supported, it means that at least one of the prompt_template parameter and the ice_template parameter must exist in the dataset configuration file.
For example:
cmmlu_infer_cfg = dict( # At least one of ice_template and prompt_template must exist
ice_template=dict(
type=PromptTemplate,
template=dict(round=[
dict(
role='HUMAN',
prompt=prompt_prefix+QUERY_TEMPLATE,
),
dict(role='BOT', prompt="{answer}\n",)
]),
),
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
],
),
ice_token='</E>', # Must be set
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
inferencer=dict(type=GenInferencer),
)
MODEL-IMPL-001ο
Error Descriptionο
When implementing a new class based on the BaseAPIModel class, the parse_text_response method is not implemented, making it impossible to test the inference service through the text interface.
Solutionο
(For developers) When implementing a subclass based on the BaseAPIModel class, if you want to test the inference service through the text interface, you need to implement the parse_text_response method, which is used to parse the text response returned by the model and convert it into the output format of the model inference service.
MODEL-IMPL-002ο
Error Descriptionο
When implementing a new class based on the BaseAPIModel class, the parse_stream_response method is not implemented, making it impossible to test the inference service through the streaming interface.
Solutionο
(For developers) When implementing a subclass based on the BaseAPIModel class, if you want to test the inference service through the streaming interface, you need to implement the parse_stream_response method, which is used to parse the streaming response returned by the model and convert it into the output format of the model inference service.
MODEL-PARAM-002ο
Error Descriptionο
In the dataset configuration file, the chat-type prompt template does not contain the role or fallback_role field.
Solutionο
Refer to the following configuration file content:
cmmlu_infer_cfg = dict(
ice_template=dict(
type=PromptTemplate,
template=dict(round=[
dict(
role='HUMAN', # Contains the 'role' field
prompt=prompt_prefix+QUERY_TEMPLATE,
),
dict(role='BOT', prompt="{answer}\n",)
]),
),
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE),
],
),
ice_token='</E>',
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified
inferencer=dict(type=GenInferencer),
)
MODEL-PARAM-003ο
Error Descriptionο
In the dataset configuration file, the value of the role parameter in the chat template of prompt engineering is not within the legal range.
Solutionο
If the chat template-related configuration in the dataset configuration file is as follows:
# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict(
role="HUMAN1",
prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
),
],
),
),
# ......
)
The error log is Unknown role HUMAN1 in chat template, legal role chosen from ['HUMAN', 'BOT', 'SYSTEM']., indicating that the value of the role parameter in the chat template is HUMAN1, while the legal role values are HUMAN, BOT, and SYSTEM. Therefore, the value of the role parameter needs to be corrected to one of HUMAN, BOT, or SYSTEM.
MODEL-PARAM-004ο
Error Descriptionο
There is no direct solution available yet.
Solutionο
If you need to resolve this issue, please raise an issue and include this error code in the issue description.
MODEL-PARAM-005ο
Error Descriptionο
There is no direct solution available yet.
Solutionο
If you need to resolve this issue, please raise an issue and include this error code in the issue description.
MODEL-TYPE-001ο
Error Descriptionο
In the dataset configuration file, a set of strings is not supported in the prompt engineering template.
Solutionο
If the prompt template-related configuration in the dataset configuration file is as follows:
# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[ # The list contains multiple strings
"{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
"{question}\nPlease reason step by step, and put your final answer within \\boxed{}."
],
),
),
# ......
)
An error will occur: Mixing str without explicit role is not allowed in API models!. Please modify round to a valid chat template, for example:
round=[
dict(
role="HUMAN1",
prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
),
],
MODEL-TYPE-002ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
MODEL-TYPE-003ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
MODEL-TYPE-004ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
MODEL-DATA-001ο
Error Descriptionο
The model task failed to retrieve model name information from the tested inference service.
Solutionο
If the error message is Failed to get service model path from http://url-to-infer-service. Error: XXXXXX, it indicates a failure to access http://url-to-infer-service/v1/models. You need to check if the tested inference service is running properly and if the /v1/models sub-service is enabled. You can also locate the cause of the access failure to http://url-to-infer-service/v1/models based on the specific error XXXXXX. If the URL http://url-to-infer-service/ does not support the v1/models sub-service, you can configure the model name in the model parameter of the model configuration file. For example:
# In vllm_api_stream_chat.py
models = [
dict(
# ......
model="name_of_model",
# ......
)
]
MODEL-DATA-002ο
Error Descriptionο
The dataset configuration file lacks required parameters.
Solutionο
If the error message is Invalid prompt content: without 'prompt' or 'prompt_mm' param!, it means the dataset configuration file does not contain either the prompt or prompt_mm parameter. You need to add one of these two parameters to the dataset configuration file. For example:
# Take aime2024_gen_0_shot_chat_prompt.py as an example
aime2024_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict( # Must contain either the 'prompt' or 'prompt_mm' field
role="HUMAN",
prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.",
),
],
),
),
# ......
)
MODEL-DATA-003ο
Error Descriptionο
Failed to parse the returned result of the request in JSON format.
Solutionο
If the error message is Unexpected response format. Please check 'error_info' in {dataset_abbr}_failed.jsonl for more information., you need to check the specific error information (content of the error_info field) in the {dataset_abbr}_failed.jsonl file under the current inference taskβs output path (e.g., outputs/default/20250628_151326/performances/vllm-api-stream-chat/) and further explore solutions.
MODEL-CFG-001ο
Error Descriptionο
The max_seq_len parameter is not configured in the local model configuration file.
Solutionο
If the error message is max_seq_len is not provided and cannot be inferred from the model config., it means you need to add the max_seq_len parameter to the local model configuration file. For example:
# In hf_chat_model.py
models = [
dict(
attr="local",
# ......
max_seq_len=2048,
# ......
)
]
MODEL-MOD-001ο
Error Descriptionο
Special dependencies required for model execution are not installed.
Solutionο
If the error message is fastchat module not found. Please install with\npip install "fschat[model_worker,webui]", it indicates that the fastchat dependency is missing. You can install it by executing pip install "fschat[model_worker,webui]".
DSET-CFG-001ο
Error Descriptionο
The dataset configuration file lacks the path field to specify the dataset path.
Solutionο
If the error message is The 'path' argument is required to load the dataset., it means the dataset configuration file does not contain the path field. You need to add the path field to the dataset configuration file. For example:
# In aime2024_gen_0_shot_chat_prompt.py
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
path='ais_bench/datasets/aime/aime.jsonl', # Required field to configure the dataset path
# ......
)
]
DSET-FILE-001ο
Error Descriptionο
The dataset file does not exist.
Solutionο
If the error message is
Path is not a directory or Parquet file: /path/to/dataset.jsonl, it means/path/to/dataset.jsonlis not a dataset in the required.parquetformat. Please confirm that the dataset format meets expectations.If the error message is
No Parquet file found in /path/to/dataset/., it means no.parquetformat dataset is found in the path/path/to/dataset/. Please confirm that the dataset format meets expectations.If the error message is
"Dataset file not found: /path/to/dataset/, it means the dataset path/path/to/dataset/itself does not exist. Please confirm that the dataset path matches the expected input path.If the error message is
Corpus file not found. Please ensure {DEFAULT_CORPUS_FILE} exists in one of: [...]when using the mooncake_trace dataset, the required corpus file was not found. Placeassets/shakespeare.txtunderais_bench/third_party/aiperf/assets/shakespeare.txt(relative to the ais_bench package root), or under one of the paths listed in the error message.
DSET-DATA-002ο
Error Descriptionο
The content structure of the dataset is invalid.
Solutionο
Please check for format issues in the dataset content based on the detailed error message.
DSET-DATA-005ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
DSET-DATA-006ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
DSET-PARAM-002ο
Error Descriptionο
Invalid values for n (number of candidate solutions generated by inference) and k (number of samples collected from candidates).
Solutionο
If the error log shows:
Maximum value of `k` 4 must be less than or equal to `n` 8
It means k is greater than n. You need to configure k as an integer less than or equal to n.
For example:
If both
nandkparameters are configured in the dataset configuration file, set their values within the valid range in the configuration file:
# In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k`
aime2024_datasets = [
dict(
abbr='aime2024',
type=Aime2024Dataset,
# ......
k=4,
n=8,
)
]
DSET-PARAM-004ο
Error Descriptionο
Invalid parameters in the dataset configuration file.
Solutionο
Please check for invalid parameter value issues in the dataset configuration file based on the detailed error message. Typical scenarios for mooncake_trace / timestamp-based scheduling:
timestamp: The
timestampfield in trace data must be of type float or int and >= 0; otherwise an error is raised with a type or range message.hash_ids and input_length incompatible: When the error message contains
Input length: ..., Hash IDs: ..., Block size: 512 ... Final block size: ... must be > 0 and <= 512, ensure(len(hash_ids)-1)*512+1 <= input_length <= len(hash_ids)*512.fixed_schedule parameters: When
fixed_schedule_end_offset >= 0,fixed_schedule_start_offsetmust be <=fixed_schedule_end_offset.
DSET-PARAM-005ο
Error Descriptionο
Required parameters are missing during dataset loading or processing.
Solutionο
Check and supply the missing required parameters according to the detailed error message. For example:
If the error is
mean must be provided, when using the mooncake_trace dataset you must provide themeanparameter (via the traceβsinput_lengthfield) for prompt generation.If the error is
Either 'input_text' or 'input_length' must be provided, in a single JSONL record of mooncake trace data, you must provide theinput_lengthfield wheninput_textis not provided.
DSET-UNK-001ο
Error Descriptionο
Unknown error of the dataset or a dependent component due to incorrect initialization.
Solutionο
If the error is βRNG manager not initialized. Call init_rng() first.β, the mooncake_trace prompt generator called derive_rng before init_rng was called. Normal loading via MooncakeTraceDataset.load() calls init_rng(random_seed) automatically; this error usually occurs when calling lower-level APIs or in tests without proper initialization order. Ensure init_rng(seed) is called before any RNG-dependent generation logic.
DSET-DEPENDENCY-002ο
Error Descriptionο
Missing dependencies required for the dataset task evaluation.
Solutionο
If the error message is:
Please install human_eval use following steps:
git clone git@github.com:open-compass/human-eval.git
cd human-eval && pip install -e .
Execute git clone git@github.com:open-compass/human-eval.git and cd human-eval && pip install -e . in sequence according to the error log content to install the human-eval library.
DSET-MTRC-001ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
DSET-MTRC-003ο
Error Descriptionο
No direct solution is available at this time.
Solutionο
If you need to resolve this issue, please submit an issue and include this error code in the issue description.
SWEB-DEPENDENCY-001ο
Error Descriptionο
mini-swe-agent dependency is missing when running SWEBench infer, so task initialization fails.
Solutionο
Install the dependency and retry:
pip install mini-swe-agent
If you use a virtual environment, make sure ais_bench and mini-swe-agent are installed in the same Python environment.
SWEB-DEPENDENCY-002ο
Error Descriptionο
SWE-bench harness dependency is missing when running SWEBench eval.
Solutionο
Install the harness from the official repository, then retry:
git clone https://github.com/SWE-bench/SWE-bench.git
cd SWE-bench
pip install -e .
SWEB-PARAM-001ο
Error Descriptionο
No valid model config is detected for SWEBench infer (required fields like model/url/api_key are missing or empty).
Solutionο
Check models[0] in your task config and provide at least:
model, for examplehosted_vllm/qwen3url, for examplehttp://127.0.0.1:2998/v1api_key,EMPTYis acceptable for local tests
SWEB-PARAM-002ο
Error Descriptionο
Invalid SWEBench dataset name that is not in the supported name set.
Solutionο
Set dataset name to one of: full, verified, lite, multilingual.
SWEB-DATA-001ο
Error Descriptionο
Prediction input contains instance_id values that do not exist in the current dataset.
Solutionο
Ensure the prediction file and eval dataset are fully aligned:
Run infer and eval with the same dataset config.
Check whether
instance_identries in predictions were manually changed or mixed from another run.
SWEB-DATA-002ο
Error Descriptionο
Failed to load SWEBench dataset from Hugging Face online source.
Solutionο
Check network connectivity and Hugging Face access first. If your environment is restricted, download parquet files manually and configure a local path.
SWEB-DATA-003ο
Error Descriptionο
Failed to read or parse local SWEBench parquet files.
Solutionο
Validate local data integrity and format:
Confirm files are valid parquet files.
Confirm file naming matches the target
split(for exampletest-*.parquet).Re-download or re-export corrupted files and retry.
SWEB-FILE-001ο
Error Descriptionο
Prediction file is missing during SWEBench eval (*.json or preds.json not found).
Solutionο
Run infer successfully before eval, and confirm prediction files exist under work_dir/predictions for the target model.
SWEB-FILE-002ο
Error Descriptionο
Local SWEBench dataset path resolution failed (path does not exist or is not accessible).
Solutionο
Check whether path in config is correct, and verify the current user has read permission for that directory/file.
SWEB-FILE-003ο
Error Descriptionο
No parquet file for the target split is found under the local dataset path.
Solutionο
Ensure one of the following exists:
<root>/data/<split>-*.parquet<root>/<split>-*.parquetFor single-file use cases,pathcan point directly to that parquet file.
SWEB-RUNTIME-001ο
Error Descriptionο
Required Docker image for SWEBench is unavailable locally and pulling failed.
Solutionο
Check Docker daemon status and network, then run docker pull for the image shown in logs. Retry after image is available.
SWEB-RUNTIME-002ο
Error Descriptionο
Runtime error occurs during SWEBench execution (for example harness execution failure or future task exception).
Solutionο
Use detailed logs to triage:
Check dependency installation, Docker availability, and prediction file format first.
If it persists, keep full logs and open an issue with the error code: https://github.com/AISBench/benchmark/issues.