# Error Code Description ## TMAN-CMD-001 ### Error Description This error indicates that a required input parameter is missing when executing a command. When launching the ais_bench evaluation tool via the command line, you must specify the model configuration and dataset configuration. Examples of valid scenarios: ```bash # When using an open-source dataset, you must specify the model task via `--models` and the dataset task via `--datasets` ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen # When using a custom dataset, you must specify the model task via `--models` and the custom dataset path via `--custom_dataset_path` ais_bench --models vllm_api_stream_chat --custom_dataset_path /path/to/custom/dataset ``` ### Solution Refer to the examples of valid scenarios to supplement the missing parameters. ## TMAN-CMD-002 ### Error Description This error indicates that the value of a command-line parameter is not within the valid range. ### Solution Search this document for the specific command line that appears in the log, and find the constraints on parameter values specified in the command line description.
For example, if this error occurs when executing `ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen --num-prompts -1 --mode perf`, search for `--num-prompts` in the document to find the constraints in the parameter description. | Parameter | Description | Example | | ---- | ---- | ---- | | `--num-prompts` | Specifies the number of test cases to evaluate in the dataset. A positive integer must be entered. If the value exceeds the number of dataset cases or is not specified, the entire dataset will be evaluated. | `--num-prompts 500` | The parameter description specifies that the value must be a positive integer (greater than 0). ## TMAN-CFG-001 ### Error Description There is a syntax error in the .py configuration file, causing parsing failure. ### Solution Check the Python syntax errors in the configuration file printed in the log (all configurable files for the ais_bench evaluation tool follow Python syntax), such as missing quotation marks or mismatched parentheses, and correct them. ## TMAN-CFG-002 ### Error Description A required parameter is missing from the .py configuration file, causing parsing failure. For example, the specific error log is: `Config file /path/to/vllm_api_stream_chat.py does not contain 'models' param!`, which indicates that the `models` parameter is missing from the configuration file. A valid `vllm_api_stream_chat.py` file contains the `models` parameter: ```python # ...... models = [ dict( attr="service", type=VLLMCustomAPIChat, abbr="vllm-api-stream-chat", # ...... ) ] ``` ### Solution In the .py configuration file printed in the error log, add the parameter that the log indicates is missing. ## TMAN-CFG-003 ### Error Description A parameter in the .py configuration file has an incorrect type, causing parsing failure. For example, the relevant configuration in the `vllm_api_stream_chat.py` configuration file is: ```python # ...... models = dict( attr="service", type=VLLMCustomAPIChat, abbr="vllm-api-stream-chat", # ...... ) ``` The specific error log is: `In config file /path/to/vllm_api_stream_chat.py, 'models' param must be a list!`, which indicates that the `models` parameter in the configuration file has an incorrect type. It should be a list type (but is actually a dictionary type). ### Solution In the .py configuration file printed in the error log, correct the incorrect parameter type to the required type as indicated by the log. ## UTILS-MATCH-001 ### Error Description The task name specified via `--models`, `--datasets`, or `--summarizer` cannot be matched to a .py configuration file with the same name as the task. ### Solution Check the task name that the log indicates cannot be matched. For example, if `xxxx` cannot be matched, the following log will be printed: ``` +------------------------+ | Not matched patterns | |------------------------| | xxxx | +------------------------+ ``` #### Scenario 1: The configuration file folder path is not specified First, execute `pip3 show ais_bench_benchmark | grep "Location:"` to check the installation path of the ais_bench evaluation tool. For example, the following information is obtained after execution: ```bash Location: /usr/local/lib/python3.10/dist-packages ``` The configuration file path is then `/usr/local/lib/python3.10/dist-packages/ais_bench/benchmark/configs`. Navigate to this path and perform the following checks: 1. If the unmatchable task name is specified via `--models`, check whether there is a .py configuration file with the same name as the task in the `models/` path (including subdirectories). 2. If the unmatchable task name is specified via `--datasets`, check whether there is a .py configuration file with the same name as the task in the `datasets/` path (including subdirectories). 3. If the unmatchable task name is specified via `--summarizer`, check whether there is a .py configuration file with the same name as the task in the `summarizers/` path (including subdirectories). #### Scenario 2: The configuration file folder path is specified If you specified the configuration file folder path via `--config-dir` when executing the command, navigate to this path and perform the following checks: 1. If the unmatchable task name is specified via `--models`, check whether there is a .py configuration file with the same name as the task in the `models/` path (including subdirectories). 2. If the unmatchable task name is specified via `--datasets`, check whether there is a .py configuration file with the same name as the task in the `datasets/` path (including subdirectories). 3. If the unmatchable task name is specified via `--summarizer`, check whether there is a .py configuration file with the same name as the task in the `summarizers/` path (including subdirectories). ## UTILS-CFG-001 ### Error Description When using the [randomly synthesized dataset](../advanced_tutorials/synthetic_dataset.md) in the `tokenid` scenario, the model configuration file must specify the tokenizer path. ### Solution Assume the ais_bench evaluation tool command is `ais_bench --models vllm_api_stream_chat --datasets synthetic_gen_tokenid --mode perf`. Then, all `path` parameters in the `models` section of the `vllm_api_stream_chat.py` configuration file (refer to [Modifying Configuration Files for Corresponding Tasks](../get_started/quick_start.md#Modifying Configuration Files for Corresponding Tasks) for the configuration file path retrieval method) must be set to the tokenizer path (usually the model weight folder path). ```python # ...... models = dict( attr="service", type=VLLMCustomAPIChat, abbr="vllm-api-stream-chat", path="/path/to/tokenizer", # Enter the tokenizer path # ...... ) ``` ## UTILS-CFG-002 ### Error Description Initializing a model instance using parameters in the model configuration file failed due to invalid parameter content. ### Solution Check the log for `build failed with the following errors:{error_content}`, and correct the parameters in the model configuration file according to the prompts in `error_content`. For example, if the `batch_size` parameter value in the model configuration file is 100001, and `error_content` is `"batch_size must be an integer in the range (0, 100000]"`, this indicates that the `batch_size` parameter exceeds the valid range (0, 100000]. You need to correct the `batch_size` parameter value to 100000. ## UTILS-CFG-003 ### Error Description The value of a parameter in the model configuration file is outside the range limited by the tool. ### Solution Configure the parameter value within the range limited by the tool according to the prompts in the detailed log. For example, if the configuration file content is: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service1", # ...... ) ] ``` The detailed error log is: ```bash Model config contain illegal attr, 'attr' in model config is 'service1', only 'local' and 'service' are supported! ``` This indicates that the value of the `attr` parameter in the model configuration is `'service1'`, but the tool only supports the values `'local'` and `'service'`. You need to set the `attr` parameter to one of the valid values. ## UTILS-CFG-004 ### Error Description Some configuration items for model parameters must be consistent across all model configurations and cannot have different values. ### Solution Unify the configuration values according to the prompts in the detailed log. For example, if the configuration file content is: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... ), dict( attr="local" ) ] ``` The detailed error log is: ```bash Cannot run local and service model together! Please check 'attr' parameter of models ``` Because the `models` configuration contains two parameter values: `'service'` and `'local'`, but the tool only supports a unified configuration of one value. Therefore, you need to set the `attr` parameter in the `models` configuration to either `'service'` or `'local'`. ## UTILS-CFG-008 ### Error Description The loaded multimodal dataset contains invalid content. ### Solution 1. If the error log is `Invalid dataset: /path/to/non-mm-dataset , please check whether the dataset is a MM-style dataset!`, it means the specified dataset `/path/to/non-mm-dataset` is not a valid multimodal dataset. Each piece of data in a valid dataset must contain at least a `type` or `path` field. If it contains a `type` field, the value of the `type` field must be one of `["image", "video", "audio"]`. 2. If the error log is `Param 'mm_type' does not match the data type of dataset: /path/to/mm-dataset , please check it!`, it means the specified dataset `/path/to/mm-dataset` is a valid multimodal dataset, but the value of the `mm_type` field in the prompt engineering configuration of the dataset configuration file is invalid. The valid values for the `mm_type` field must be one of `["image", "video", "audio"]`. ## UTILS-DEPENDENCY-001 ### Error Description A required dependency module is missing during execution. ### Solution If the detailed error log is `Failed to import required modules. Please install the necessary packages: pip install math_verify`, follow the guidance in the detailed log and execute `pip install math_verify` to install the dependent library. ## UTILS-TYPE-001 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-002 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-003 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-004 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-005 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-006 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-007 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-TYPE-008 ### Error Description The command-line parameter value is too large. ### Solution If the error log shows `'--max-num-workers' must be <= 8, but got 9 ......`, it indicates that the value of the command-line parameter `--max-num-workers` is 9. However, the tool only supports a maximum of 8 concurrent workers. Therefore, you need to adjust the value of `--max-num-workers` to be ≤ 8. ## UTILS-TYPE-009 ### Error Description The command-line parameter value is not an integer type. ### Solution If the error log shows `'--max-num-workers' must be an integer, but got '9' ......`, it indicates that the value of the command-line parameter `--max-num-workers` is the string '9'. However, the tool only supports integer-type values. Therefore, you need to correct the value of `--max-num-workers` to an integer type. ## UTILS-TYPE-010 ### Error Description The command-line parameter value is too small. ### Solution If the error log shows `'--max-num-workers' must be >= 1, but got 0 ......`, it indicates that the value of the command-line parameter `--max-num-workers` is 0. However, the tool only supports at least 1 concurrent worker. Therefore, you need to adjust the value of `--max-num-workers` to be ≥ 1. ## UTILS-PARAM-001 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-PARAM-002 ### Error Description In the custom dataset scenario, the `request_count` parameter in the configuration file `*.meta.json` is outside the valid range. ### Solution If the error message is `Please make sure that the value of parameter 'request_count' can be converted to int(greater than 0).`, it means the `request_count` parameter in `*.meta.json` needs to be set to > 0. ## UTILS-PARAM-003 ### Error Description In the custom dataset scenario, the `min_value` parameter is greater than the `max_value` parameter in the configuration file `*.meta.json`. ### Solution If the error message is `When the uniform distribution is set, parameter 'min_value' must be less than or equal to parameter 'max_value'.`, it means the `min_value` parameter in `*.meta.json` needs to be set to ≤ the `max_value` parameter. You need to correct it to `min_value` ≤ `max_value`. ## UTILS-PARAM-004 ### Error Description In the custom dataset scenario, the `min_value` and `max_value` parameters in the configuration file `*.meta.json` are outside the valid range. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## UTILS-PARAM-005 ### Error Description In the custom dataset scenario, the configuration file `*.meta.json` lacks required parameters. ### Solution For example, if the error message is `When the uniform distribution is set, parameter 'min_value' and 'max_value' must be provided.`, it means that in the uniform distribution scenario, both the `min_value` and `max_value` parameters need to be set in `*.meta.json`. ## UTILS-PARAM-006 ### Error Description In the custom dataset scenario, the `percentage_distribute` parameter in the configuration file `*.meta.json` is invalid. ### Solution The valid value range for the `percentage_distribute` parameter is described in the detailed log as follows: ``` Ensure the configuration data follows the format [max_tokens, percentage], where: - 'max_tokens' must be a positive number (greater than 0). - 'percentage' must be a float between 0 and 1 (greater than 0 and inclusive 1). - The sum of all 'percentage' values must equal exactly 1. Example valid format: [[1000, 0.5],[500,0.5]] or [[2000, 1.0]] Example invalid formats: [[0, 0.5]] (max_tokens <= 0), [[1000, 1.5]] (percentage > 1), [[1000, 0.3], [500,0.2]] (sum not 1) ``` ## UTILS-PARAM-007 ### Error Description In the custom dataset scenario, the value of the `method` parameter (which defines the data distribution method) in the configuration file `*.meta.json` is outside the valid range. ### Solution If the error message is `Type of data distribution(method): uniform1 not supported, legal methods chosen from ['uniform', 'percentage'].`, it means the value `uniform1` of the `method` parameter in `*.meta.json` is outside the valid range. You need to correct the `method` parameter value to either `uniform` or `percentage`. ## UTILS-PARAM-008 ### Error Description In the custom dataset scenario, the configuration file `*.meta.json` contains invalid fields. ### Solution If the specific error message is `There are illegal keys: xxxxxx,yyyyyy`, it means the `*.meta.json` file contains the two invalid fields `xxxxxx` and `yyyyyy`. You need to delete these two fields from `*.meta.json`. ## UTILS-FILE-002 ### Error Description The tokenizer path specified by the `path` parameter in the model configuration file does not exist. ### Solution If the content of the model configuration file is as follows: ```python # In vllm_stream_api_chat.py models = [ dict( # ...... path="/path/to/invalid", # ...... ), ] ``` And the specific error log is `Tokenizer path '/path/to/invalid' does not exist`, it indicates that the tokenizer path `/path/to/invalid` specified by the `path` parameter in the model configuration file does not exist (an empty path is also considered non-existent). You need to correct it to an existing tokenizer path. ## UTILS-FILE-003 ### Error Description Failed to load the tokenizer file. ### Solution If the error message is `Failed to load tokenizer from /path/to/tokenizer: ExceptionName: XXXXXX`, first confirm whether the tokenizer file under the path `/path/to/tokenizer` is compatible with the `transformers` version of the current runtime environment. If compatible, perform further troubleshooting based on the specific error information represented by `XXXXXX`. ## UTILS-FILE-004 ### Error Description No direct solution is available yet. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## PARTI-FILE-001 ### Error Description Insufficient permissions for the output path file; the tool cannot write results to it. ### Solution For example, if the error log is: ```bash Current user can't modify /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json, reuse will not enable. ``` Execute `ls -l /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json` to check the owner and permissions of this path. If the current user does not have write permission for the file, you need to add write permission for the current user (for example, execute `chmod u+w /path/to/workspace/outputs/default/20250628_151326/predictions/vllm-api-stream-chat/gsm8k.json` to add write permission for the current user). ## CALC-MTRC-001 ### Error Description The performance result data is invalid, and metrics cannot be calculated. ### Solution #### Scenario 1: The original performance result data is empty If you specified recalculation of performance results via `--mode perf_viz` when executing the command, and the base output path is `outputs/default/20250628_151326` (find `Current exp folder: ` in the console output), check whether all `*_details.jsonl` files in the `performances/` folder under this path are empty. If they are empty, you need to run the evaluation once first to generate performance result data. #### Scenario 2: The original performance result data contains no valid values If you specified recalculation of performance results via `--mode perf_viz` when executing the command, and the base output path is `outputs/default/20250628_151326` (find `Current exp folder: ` in the console output), check whether the `*_details.jsonl` files in the `performances/` folder under this path contain no valid fields (they may have been tampered with). If so, you need to re-run the performance evaluation to generate new data. ## CALC-FILE-001 ### Error Description Failed to save performance result data to disk. ### Solution If the detailed error log is: ```bash Failed to write request level performance metrics to csv file '{/path/to/workspace/outputs/default/20250628_151326/performances/vllm-api-stream-chat/gsm8k.csv': XXXXXX ``` Where `XXXXXX` is the specific reason for the disk-saving failure. For example, `Permission denied` means the file already exists and the current user does not have write permission. You can either delete the file or add write permission for the current user to the existing file. ## CALC-DATA-001 ### Error Description No valid performance metric data was obtained for all completed inference requests, and metrics cannot be calculated. ### Solution If the specific log is: ```bash All requests failed, cannot calculate performance results. Please check the error logs from responses! ``` This indicates that all requests during the inference process failed. You need to further check the logs of failed requests to identify the cause of the failure. 1. If the command includes `--debug`, the logs of failed requests will be printed directly to the console, and you can view them in the console records. 2. If the command does not include `--debug`, the console records will contain logs similar to `[ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out`. You can view the specific cause of the request failure in `outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out`. ## CALC-DATA-002 ### Error Description When calculating steady-state performance metrics, no requests belonging to the steady state were found among all request information, and steady-state metrics cannot be calculated. ### Solution You can check the concurrency graph of inference requests (reference document: https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to confirm whether the `Request Concurrency Count` in the concurrency step graph reaches the concurrency number set in the model configuration file (the `batch_size` parameter) **and at least two requests reach the maximum concurrency number**. If the above conditions are not met, you can try the following methods to achieve a steady state: #### Scenario A: `Request Concurrency Count` in the concurrency step graph increases continuously and then decreases continuously 1. Reduce the concurrency number of inference requests (the `batch_size` parameter in the model configuration file). 2. Increase the total number of inference requests. #### Scenario B: `Request Concurrency Count` in the concurrency step graph increases continuously, fluctuates for a period of time, and then decreases continuously 1. Reduce the concurrency number of inference requests (the `batch_size` parameter in the model configuration file). 2. Increase the frequency of sending inference requests (the `request_rate` parameter in the model configuration file). ## SUMM-TYPE-001 ### Error Description The `abbr` parameter configurations of all dataset tasks are mixed (i.e., use different types). ### Solution For example, if the error log is: ```bash mixed dataset_abbr type is not supported, dataset_abbr type only support (list, tuple) or str. ``` This indicates that in the `datasets` configuration, the `abbr` parameter configurations of all dataset tasks use different types (e.g., `list` and `str`). You need to unify the `abbr` parameter configurations of all dataset tasks to use the same type (e.g., `list` or `str`). ## SUMM-FILE-001 ### Error Description There are no performance data files (`*_details.jsonl`) in the output working path. ### Solution 1. Confirm whether you incorrectly specified recalculation of performance results via `--mode perf_viz` when executing the evaluation. If you want to run a complete performance test, specify `--mode perf`. 2. Confirm whether the base output path is correct (e.g., `outputs/default/20250628_151326`; find `Current exp folder: ` in the console output). 3. Confirm whether there are `*_details.jsonl` files in the `performances/` folder under this path. If not, check other error information in the previous console logs to confirm whether other errors caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs. ## SUMM-MTRC-001 ### Error Description The number of valid fields is inconsistent across requests in the detailed performance data. ### Solution Check whether the number of valid fields is consistent across all requests in the `*_details.jsonl` files under the base output path (e.g., `outputs/default/20250628_151326`; find `Current exp folder: ` in the console output). If inconsistent, check whether there are other errors in the historical console logs that caused the performance data files to not be generated, and perform further troubleshooting based on the guidance of other error logs. ## RUNNER-TASK-001 ### Error Description The evaluation task failed to execute. ### Solution For example, if the specific error message is `[ERROR] [RUNNER-TASK-001]task failed. OpenICLApiInfervllm-api-stream-chat/synthetic failed with code 1, see outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out`, please view the specific error information in `outputs/default/20251125_160128/logs/infer/vllm-api-stream-chat/synthetic.out` to identify the cause of the failure. ## TINFER-PARAM-001 ### Error Description The maximum concurrency value `batch_size` in the model configuration file is outside the valid range. ### Solution If the error log shows `Concurrency must be greater than 0 and <= 100000, but got -1`, it means the maximum concurrency of the model is configured as -1. You need to set the `batch_size` parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000. Example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... batch_size=100, # ...... ), ] ``` ## TINFER-PARAM-002 ### Error Description The `num_return_sequences` parameter (number of returned sequences) of the `generation_kwargs` parameter in the model configuration file is outside the valid range. ### Solution If the error log shows `num_return sequences must be a positive integer, but got {0}`, it means the number of returned sequences of the model is configured as 0. You need to set the `num_return_sequences` parameter in the model configuration file to an integer greater than 0. Example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... generation_kwargs=dict( num_return_sequences=1, ), # ...... ), ] ``` ## TINFER-PARAM-004 ### Error Description The `ramp_up_strategy` parameter (ramp-up strategy) of the `traffic_cfg` parameter in the model configuration file is outside the valid range. ### Solution If the error log shows `Invalid ramp_up_strategy: {constant} only support 'linear' and 'exponential'`, it means the request sending strategy of the model is configured to a value not in `['exponential', 'linear']`. You need to set the `ramp_up_strategy` parameter in the model configuration file to `'exponential'` or `'linear'`. Example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... traffic_cfg=dict( ramp_up_strategy="linear", ), # ...... ), ] ``` ## TINFER-PARAM-005 ### Error Description Excessively high virtual memory usage when the tool runs inference. ### Solution If the specific error log is: ```bash Virtual memory usage too high: 90% > 80% (Total memory: 50 GB "Used: 45 GB, Available: 5 GB, Dataset needed memory size: 3000 MB) ``` It indicates that the current system memory is 50GB, with 45GB used and 5GB available, while the dataset requires 3000MB of memory, thus triggering this error. Solutions are divided into two cases: 1. If the total system memory is insufficient, increase the system memory. 2. If the total system memory is sufficient but the memory required by the dataset is greater than the available memory, clear the occupied memory or cache on the current server. ## TINFER-PARAM-006 ### Error Description No timestamps found in datasets, but `use_timestamp` is True! Make sure your dataset contains `timestamp` field or set `use_timestamp` to False in model config. ### Solution If the error log is `No timestamps found in datasets, but `use_timestamp` is True! Make sure your dataset contains `timestamp` field or set `use_timestamp` to False in model config.`, it means the dataset configuration file does not contain the `timestamp` field, but the `use_timestamp` parameter in the model configuration file is True. You need to set the `use_timestamp` parameter in the model configuration file to False. Example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... use_timestamp=False, # ...... ), ] ``` ## TINFER-IMPL-001 ### Error Description When executing a service-oriented inference task, a process fails to start while multiple processes are launched within the inference task. ### Solution If the error log is: ```bash Failed to start worker x: XXXXXX, total workers to launch: 4 ``` Where `x` is the ID of the failed process, `XXXXXX` is the specific reason for the failure, and `4` is the total number of processes. Solutions: 1. If the number of occurrences of this error log is equal to the total number of processes, it means all processes failed to start. Check the specific failure reason, take corresponding measures, and retry. 2. If the number of occurrences of this error log is less than the total number of processes, it means some processes failed to start. Partial process startup failures do not affect the execution of the evaluation task but will impact the actual maximum concurrency `batch_size`. Decide whether to manually interrupt to locate the specific failure reason based on actual circumstances. ## TINFER-RUNTIME-001 ### Error Description All requests fail during the warm-up phase when evaluating inference serviceization. ### Solution If the error log is `Exit task because all warmup requests failed, failed reasons: XXXXXX`, locate the problem based on the specific failure reason `XXXXXX` (**error information from the service**), take corresponding measures, and retry. ## TEVAL-PARAM-001 ### Error Description Invalid values for the number of candidate solutions generated by inference `n` and the number of samples collected from them `k`. ### Solution If the error log is: ```bash k and n must be greater than 0 and k <= n, but got k: 16, n: 8 ``` It means `k` is greater than `n`. You need to configure `k` to an integer less than or equal to `n`. Examples: 1. If both `n` and `k` parameters are configured in the dataset configuration file, set their values to the valid range in the configuration file: ```python # In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k` aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, # ...... k=4, n=8, ) ] ``` 2. If the `n` parameter is not configured in the dataset configuration file, the value of the `num_return_sequences` parameter in the model configuration file will be used as the value of `n`. You need to configure `k` in the dataset configuration file to an integer less than or equal to `num_return_sequences` in the model configuration file. ```python # In vllm_stream_api_chat.py, the num_return_sequences parameter corresponds to `n` models = [ dict( attr="service", # ...... generation_kwargs=dict( num_return_sequences=8, ), # ...... ), ] # In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k` aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, # ...... k=4, ) ] ``` ## ICLI-PARAM-001 ### Error Description The type of the retriever parameter for constructing prompt engineering in the dataset configuration file is not a subclass of BaseRetriever or a list of subclasses of BaseRetriever. ### Solution 1. If you want to use a custom retriever class `CustomedRetriever`, ensure that `CustomedRetriever` is a subclass of `BaseRetriever`. 2. If you want to use multiple custom retriever classes `CustomedRetriever1, CustomedRetriever2`, configure the `retriever` parameter in the dataset configuration file as `[CustomedRetriever1, CustomedRetriever2]`, and each class in the list must inherit from `BaseRetriever`. ## ICLI-PARAM-002 ### Error Description The value of the infer_mode parameter in the inferencer configuration in the multi-turn dialogue dataset configuration file is outside the valid range. ### Solution Taking the mtbench configuration file as an example, if the configuration of mtbench_gen.py is as follows: ```python mtbench_infer_cfg = dict( # ...... inferencer=dict(type=MultiTurnGenInferencer, infer_mode="every1") ) ``` The log error is: ```bash Multiturn dialogue infer model only supports every、last or every_with_gt, but got every1 ``` The correct configuration should set the infer_mode parameter to one of `every`, `last`, or `every_with_gt`. ## ICLI-PARAM-003 ### Error Description When specifying `--mode perf --pressure` in the command line for performance stress testing, the batch_size parameter is not specified in the model configuration file. ### Solution Taking the `vllm_stream_api_chat.py` configuration file as an example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... batch_size=16, # ...... ), ] ``` ## ICLI-PARAM-004 ### Error Description The maximum concurrency value `batch_size` in the model configuration file is outside the valid range. ### Solution If the error log shows `The range of batch_size is [1, 100000], but got -1. Please set it in datasets config`, it means the maximum concurrency of the model is configured as -1. You need to set the `batch_size` parameter in the model configuration file to an integer greater than 0 and less than or equal to 100000. Example: ```python # In vllm_stream_api_chat.py models = [ dict( attr="service", # ...... batch_size=100, # ...... ), ] ``` ## ICLI-PARAM-006 ### Error Description PPL-type datasets do not support performance testing. ### Solution Check the used dataset configuration file, for example: ```python # In ARC_c_ppl_0_shot_str.py ARC_c_infer_cfg = dict( # ...... inferencer=dict(type=PPLInferencer)) ``` The type of `inferencer` is `PPLInferencer`. Such dataset configuration files do not support performance testing, so you need to replace them with other dataset configuration files or specify `--mode all` to execute accuracy evaluation. ## ICLI-PARAM-007 ### Error Description PPL-type datasets do not support inference using streaming model configurations. ### Solution Check the used dataset configuration file, for example: ```python # In ARC_c_ppl_0_shot_str.py ARC_c_infer_cfg = dict( # ...... inferencer=dict(type=PPLInferencer)) ``` The type of `inferencer` is `PPLInferencer`. Such dataset configuration files do not support inference using streaming model configurations, so you need to replace them with other dataset configuration files, or specify a non-streaming model configuration file via `--models`, such as `--models vllm_api_general_chat`. ## ICLI-IMPL-004 ### Error Description BFCL datasets do not support performance testing. ### Solution 1. If you want to use the BFCL dataset task for accuracy testing but mistakenly specify `--mode perf` in the command line (which triggers performance testing), change the command line to `--mode all` to specify accuracy testing. 2. If you want to use the BFCL dataset task for performance testing, it is not supported currently. ## ICLI-IMPL-006 ### Error Description Model tasks with streaming interfaces do not support accuracy evaluation using BFCL datasets. ### Solution Refer to [Model Configuration Instructions](../base_tutorials/all_params/models.md) and select model tasks with text interfaces (e.g., `vllm_api_general_chat`) for inference. ## ICLI-IMPL-008 ### Error Description The model backend corresponding to the current model configuration file has not implemented the methods required for PPL inference. ### Solution Refer to the documentation (not yet available) to check which model configurations support PPL inference, such as `vllm_api_general_chat`. ## ICLI-IMPL-010 ### Error Description No token IDs in the result of a PPL inference, leading to failure in loss calculation. ### Solution Verify whether the tested inference object (inference service) supports PPL inference and can normally return valid `prompt_logprobs` required for PPL inference. ## ICLI-RUNTIME-001 ### Error Description Failed to obtain inference results when accessing the inference service during warm-up. ### Solution If the log shows `Get result from cache queue failed: XXXXXX` (where `XXXXXX` is the specific reason for the failure to obtain inference results), take corresponding measures based on the specific reason (e.g., if it is a timeout-related exception, confirm whether the timeout setting of the inference service is reasonable or check if the current configuration can access the inference service normally). ## ICLI-FILE-001 ### Error Description Failed to write inference result files to disk. ### Solution 1. If the log shows `Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.json: XXXXXX`, it means the inference results failed to be written to disk in the accuracy scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated by `XXXXXX` (e.g., permission issues, insufficient disk space, etc.). 2. If the log shows `Failed to write results to /path/to/outputs/default/20250628_151326/*/*/*.jsonl: XXXXXX`, it means the inference results failed to be written to disk in the performance scenario. Troubleshoot and resolve the issue based on the specific saving reason indicated by `XXXXXX` (e.g., permission issues, insufficient disk space, etc.). ## ICLI-FILE-002 ### Error Description Failed to save numpy-format data (e.g., ITL data for each request) to the database. ### Solution If the log shows `Failed to save numpy array to database: XXXXXX`, it means the numpy-format data failed to be saved to the database. Troubleshoot and resolve the issue based on the specific saving reason indicated by `XXXXXX` (e.g., database connection issues, non-existent database tables, etc.). ## ICLE-DATA-002 ### Error Description The configured number of candidate solutions generated by inference `n` is inconsistent with the actual number of returned candidate solutions. ### Solution 1. If `--mode all` is specified in the command line or `--mode` is not specified (indicating execution of infer + evaluate), triggering this exception means there is a bug in the tool itself. You can provide feedback in the [issue](https://github.com/AISBench/benchmark/issues). 2. If `--mode eval` is specified in the command line (evaluation based on previous inference results), and the exception error is: `Replication length mismatch, len of replications: 4 != n: 8`, then set the parameter `n` in the configuration file corresponding to the dataset task to the number of replications `4`: ```python # In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k` aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, # ...... n=4, ) ] ``` ## ICLR-TYPE-001 ### Error Description In the dataset configuration file, the type of the prompt template is incorrect. Only `str` or `dict` types are supported currently. ### Solution Ensure that the type of the prompt template in the inference configuration of the dataset configuration file is `str` or `dict`, for example: ```python # In aime2024_gen_0_shot_str.py aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template='{question}\nPlease reason step by step, and put your final answer within \\boxed{}.' # str type ), # ...... ) # In aime2024_gen_0_shot_chat_prompt.py aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( # dict type round=[ dict( role="HUMAN", prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", ), ], ), ), # ...... ) ``` If the type of the value of the `template` parameter is incorrect, correct it to `str` or `dict` type. ## ICLR-TYPE-002 ### Error Description In the dataset configuration file, when the type of the prompt template is `dict`, the value type of all key-value pairs in it is incorrect. Currently, the supported value types are only `str`, `list`, and `dict`. ### Solution Ensure that in the dataset configuration file, the value type of all key-value pairs in the prompt template under the inference configuration is `str`, `list`, or `dict`. For example: ```python # In aime2024_gen_0_shot_chat_prompt.py aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( # dict type round=[ dict( role="HUMAN", # str type prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", # str type ), ], ), ), # ...... ) ``` ## ICLR-PARAM-001 ### Error Description In the dataset configuration file, when the `ice_token` parameter is configured in the prompt template, the value of the `template` parameter does not contain the value of the `ice_token` parameter. ### Solution 1. When the type of the `template` parameter is `str`, ensure that the string value of `template` contains the value of the `ice_token` parameter. For example: ```python # In ceval_gen_5_shot_str.py ceval_infer_cfg = dict( ice_template=dict( type=PromptTemplate, template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}', # The string contains '', the value of ice_token ice_token='', ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), inferencer=dict(type=GenInferencer), ) ``` 2. When the type of the `template` parameter is `dict`, ensure that the value of at least one key-value pair in the dictionary of the `template` value contains the value of the `ice_token` parameter. For example: ```python # In aime2024_gen_0_shot_chat_prompt.py cmmlu_infer_cfg = dict( # ...... prompt_template=dict( type=PromptTemplate, template=dict( begin='', # Same as the value of ice_token round=[ dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE), ], ), ice_token='', ), # ...... ) ``` ## ICLR-PARAM-002 ### Error Description The `ice_template` parameter is not specified when the dataset configuration file needs to construct few-shots based on the training set. ### Solution Take `cmmlu_gen_5_shot_cot_chat_prompt.py` as an example. This configuration specifies `retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),` to construct few-shots, so the `ice_template` parameter must be specified. You can modify it with reference to the following content: ```python cmmlu_infer_cfg = dict( ice_template=dict( # ice_template must be configured type=PromptTemplate, template=dict(round=[ dict( role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE, ), dict(role='BOT', prompt="{answer}\n",) ]), ), prompt_template=dict( type=PromptTemplate, template=dict( begin='', round=[ dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE), ], ), ice_token='', ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified inferencer=dict(type=GenInferencer), ) ``` ## ICLR-PARAM-003 ### Error Description In the multimodal dataset configuration file, the key value of the `prompt_mm` parameter in the prompt template is not one of ["text", "image", "video", "audio"]. ### Solution Take `textvqa_gen_base64.py` as an example. In this configuration, the key value of the `prompt_mm` parameter in the prompt template is one of "text", "image", "video", "audio". You can modify it with reference to the following content: ```python textvqa_infer_cfg = dict( prompt_template=dict( type=MMPromptTemplate, template=dict( round=[ dict(role="HUMAN", prompt_mm={ # The key value of the prompt_mm parameter is one of "text", "image", "video", "audio" "text": {"type": "text", "text": "{question} Answer the question using a single word or phrase."}, "image": {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image}"}}, "video": {"type": "video_url", "video_url": {"url": "data:video/jpeg;base64,{video}"}}, "audio": {"type": "audio_url", "audio_url": {"url": "data:audio/wav;base64,{audio}"}}, }) ] ) ), retriever=dict(type=ZeroRetriever), inferencer=dict(type=GenInferencer) ) ``` ## ICLR-PARAM-004 ### Error Description The id values in `fix_id_list` for constructing few-shots in the dataset configuration file exceed the range of selectable ids in the training set. ### Solution If the configuration for constructing few-shots in the dataset configuration file is as follows: ```python retriever=dict(type=FixKRetriever, fix_id_list=[1,2,5,8]), ``` The detailed error log is `Fix-K retriever index 8 is out of range of [0, 8)`, indicating that the id value 8 in `fix_id_list` exceeds the range [0, 8) of selectable ids in the training set and needs to be corrected to a value within this range. ## ICLR-IMPL-002 ### Error Description The `ice_token` parameter is not configured in the prompt template of the dataset configuration file. ### Solution 1. If both the `prompt_template` parameter and the `ice_template` parameter exist, and the log error is `ice_token of prompt_template is not provided`, then the `ice_token` parameter must exist in the `prompt_template` parameter. For example: ```python cmmlu_infer_cfg = dict( ice_template=dict( type=PromptTemplate, template=dict(round=[ dict( role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE, ), dict(role='BOT', prompt="{answer}\n",) ]), ), prompt_template=dict( type=PromptTemplate, template=dict( begin='', round=[ dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE), ], ), ice_token='', # Must be set ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified inferencer=dict(type=GenInferencer), ) ``` 2. If only the `ice_template` parameter exists, and the log error is `ice_token of ice_template is not provided`, then the `ice_token` parameter must exist in the `ice_template` parameter. For example: ```python ceval_infer_cfg = dict( ice_template=dict( type=PromptTemplate, template=f'Below are single-choice questions from the {_ch_name} exam in China. Please select the correct answer.\n{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nAnswer: {{answer}}', ice_token='', # Must exist ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), inferencer=dict(type=GenInferencer), ) ``` ## ICLR-IMPL-003 ### Error Description Necessary template fields are missing in the dataset configuration file. ### Solution If the error log is `Leaving prompt as empty is not supported`, it means that at least one of the `prompt_template` parameter and the `ice_template` parameter must exist in the dataset configuration file. For example: ```python cmmlu_infer_cfg = dict( # At least one of ice_template and prompt_template must exist ice_template=dict( type=PromptTemplate, template=dict(round=[ dict( role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE, ), dict(role='BOT', prompt="{answer}\n",) ]), ), prompt_template=dict( type=PromptTemplate, template=dict( begin='', round=[ dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE), ], ), ice_token='', # Must be set ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified inferencer=dict(type=GenInferencer), ) ``` ## MODEL-IMPL-001 ### Error Description When implementing a new class based on the `BaseAPIModel` class, the `parse_text_response` method is not implemented, making it impossible to test the inference service through the text interface. ### Solution (For developers) When implementing a subclass based on the `BaseAPIModel` class, if you want to test the inference service through the text interface, you need to implement the `parse_text_response` method, which is used to parse the text response returned by the model and convert it into the output format of the model inference service. ## MODEL-IMPL-002 ### Error Description When implementing a new class based on the `BaseAPIModel` class, the `parse_stream_response` method is not implemented, making it impossible to test the inference service through the streaming interface. ### Solution (For developers) When implementing a subclass based on the `BaseAPIModel` class, if you want to test the inference service through the streaming interface, you need to implement the `parse_stream_response` method, which is used to parse the streaming response returned by the model and convert it into the output format of the model inference service. ## MODEL-PARAM-002 ### Error Description In the dataset configuration file, the chat-type prompt template does not contain the `role` or `fallback_role` field. ### Solution Refer to the following configuration file content: ```python cmmlu_infer_cfg = dict( ice_template=dict( type=PromptTemplate, template=dict(round=[ dict( role='HUMAN', # Contains the 'role' field prompt=prompt_prefix+QUERY_TEMPLATE, ), dict(role='BOT', prompt="{answer}\n",) ]), ), prompt_template=dict( type=PromptTemplate, template=dict( begin='', round=[ dict(role='HUMAN', prompt=prompt_prefix+QUERY_TEMPLATE), ], ), ice_token='', ), retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shots specified inferencer=dict(type=GenInferencer), ) ``` ## MODEL-PARAM-003 ### Error Description In the dataset configuration file, the value of the `role` parameter in the chat template of prompt engineering is not within the legal range. ### Solution If the chat template-related configuration in the dataset configuration file is as follows: ```python # Take aime2024_gen_0_shot_chat_prompt.py as an example aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( round=[ dict( role="HUMAN1", prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", ), ], ), ), # ...... ) ``` The error log is `Unknown role HUMAN1 in chat template, legal role chosen from ['HUMAN', 'BOT', 'SYSTEM'].`, indicating that the value of the role parameter in the chat template is HUMAN1, while the legal role values are HUMAN, BOT, and SYSTEM. Therefore, the value of the role parameter needs to be corrected to one of HUMAN, BOT, or SYSTEM. ## MODEL-PARAM-004 ### Error Description There is no direct solution available yet. ### Solution If you need to resolve this issue, [please raise an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## MODEL-PARAM-005 ### Error Description There is no direct solution available yet. ### Solution If you need to resolve this issue, [please raise an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## MODEL-TYPE-001 ### Error Description In the dataset configuration file, a set of strings is not supported in the prompt engineering template. ### Solution If the prompt template-related configuration in the dataset configuration file is as follows: ```python # Take aime2024_gen_0_shot_chat_prompt.py as an example aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( round=[ # The list contains multiple strings "{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", "{question}\nPlease reason step by step, and put your final answer within \\boxed{}." ], ), ), # ...... ) ``` An error will occur: `Mixing str without explicit role is not allowed in API models!`. Please modify `round` to a valid chat template, for example: ```python round=[ dict( role="HUMAN1", prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", ), ], ``` ## MODEL-TYPE-002 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## MODEL-TYPE-003 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## MODEL-TYPE-004 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## MODEL-DATA-001 ### Error Description The model task failed to retrieve model name information from the tested inference service. ### Solution If the error message is `Failed to get service model path from http://url-to-infer-service. Error: XXXXXX`, it indicates a failure to access `http://url-to-infer-service/v1/models`. You need to check if the tested inference service is running properly and if the `/v1/models` sub-service is enabled. You can also locate the cause of the access failure to `http://url-to-infer-service/v1/models` based on the specific error `XXXXXX`. If the URL `http://url-to-infer-service/` does not support the `v1/models` sub-service, you can configure the model name in the `model` parameter of the model configuration file. For example: ```python # In vllm_api_stream_chat.py models = [ dict( # ...... model="name_of_model", # ...... ) ] ``` ## MODEL-DATA-002 ### Error Description The dataset configuration file lacks required parameters. ### Solution If the error message is `Invalid prompt content: without 'prompt' or 'prompt_mm' param!`, it means the dataset configuration file does not contain either the `prompt` or `prompt_mm` parameter. You need to add one of these two parameters to the dataset configuration file. For example: ```python # Take aime2024_gen_0_shot_chat_prompt.py as an example aime2024_infer_cfg = dict( prompt_template=dict( type=PromptTemplate, template=dict( round=[ dict( # Must contain either the 'prompt' or 'prompt_mm' field role="HUMAN", prompt="{question}\nPlease reason step by step, and put your final answer within \\boxed{}.", ), ], ), ), # ...... ) ``` ## MODEL-DATA-003 ### Error Description Failed to parse the returned result of the request in JSON format. ### Solution If the error message is `Unexpected response format. Please check 'error_info' in {dataset_abbr}_failed.jsonl for more information.`, you need to check the specific error information (content of the `error_info` field) in the `{dataset_abbr}_failed.jsonl` file under the current inference task's output path (e.g., `outputs/default/20250628_151326/performances/vllm-api-stream-chat/`) and further explore solutions. ## MODEL-CFG-001 ### Error Description The `max_seq_len` parameter is not configured in the local model configuration file. ### Solution If the error message is `max_seq_len is not provided and cannot be inferred from the model config.`, it means you need to add the `max_seq_len` parameter to the local model configuration file. For example: ```python # In hf_chat_model.py models = [ dict( attr="local", # ...... max_seq_len=2048, # ...... ) ] ``` ## MODEL-MOD-001 ### Error Description Special dependencies required for model execution are not installed. ### Solution If the error message is `fastchat module not found. Please install with\npip install "fschat[model_worker,webui]"`, it indicates that the `fastchat` dependency is missing. You can install it by executing `pip install "fschat[model_worker,webui]"`. ## DSET-CFG-001 ### Error Description The dataset configuration file lacks the `path` field to specify the dataset path. ### Solution If the error message is `The 'path' argument is required to load the dataset.`, it means the dataset configuration file does not contain the `path` field. You need to add the `path` field to the dataset configuration file. For example: ```python # In aime2024_gen_0_shot_chat_prompt.py aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, path='ais_bench/datasets/aime/aime.jsonl', # Required field to configure the dataset path # ...... ) ] ``` ## DSET-FILE-001 ### Error Description The dataset file does not exist. ### Solution 1. If the error message is `Path is not a directory or Parquet file: /path/to/dataset.jsonl`, it means `/path/to/dataset.jsonl` is not a dataset in the required `.parquet` format. Please confirm that the dataset format meets expectations. 2. If the error message is `No Parquet file found in /path/to/dataset/.`, it means no `.parquet` format dataset is found in the path `/path/to/dataset/`. Please confirm that the dataset format meets expectations. 3. If the error message is `"Dataset file not found: /path/to/dataset/`, it means the dataset path `/path/to/dataset/` itself does not exist. Please confirm that the dataset path matches the expected input path. 4. If the error message is `Corpus file not found. Please ensure {DEFAULT_CORPUS_FILE} exists in one of: [...]` when using the mooncake_trace dataset, the required corpus file was not found. Place `assets/shakespeare.txt` under **`ais_bench/third_party/aiperf/assets/shakespeare.txt`** (relative to the ais_bench package root), or under one of the paths listed in the error message. ## DSET-DATA-002 ### Error Description The content structure of the dataset is invalid. ### Solution Please check for format issues in the dataset content based on the detailed error message. ## DSET-DATA-005 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## DSET-DATA-006 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## DSET-PARAM-002 ### Error Description Invalid values for `n` (number of candidate solutions generated by inference) and `k` (number of samples collected from candidates). ### Solution If the error log shows: ```bash Maximum value of `k` 4 must be less than or equal to `n` 8 ``` It means `k` is greater than `n`. You need to configure `k` as an integer less than or equal to `n`. For example: 1. If both `n` and `k` parameters are configured in the dataset configuration file, set their values within the valid range in the configuration file: ```python # In aime2024_gen_0_shot_str.py, the k parameter corresponds to `k` aime2024_datasets = [ dict( abbr='aime2024', type=Aime2024Dataset, # ...... k=4, n=8, ) ] ``` ## DSET-PARAM-004 ### Error Description Invalid parameters in the dataset configuration file. ### Solution Please check for invalid parameter value issues in the dataset configuration file based on the detailed error message. Typical scenarios for mooncake_trace / timestamp-based scheduling: 1. **timestamp**: The `timestamp` field in trace data must be of type float or int and >= 0; otherwise an error is raised with a type or range message. 2. **hash_ids and input_length incompatible**: When the error message contains `Input length: ..., Hash IDs: ..., Block size: 512 ... Final block size: ... must be > 0 and <= 512`, ensure `(len(hash_ids)-1)*512+1 <= input_length <= len(hash_ids)*512`. 3. **fixed_schedule parameters**: When `fixed_schedule_end_offset >= 0`, `fixed_schedule_start_offset` must be <= `fixed_schedule_end_offset`. ## DSET-PARAM-005 ### Error Description Required parameters are missing during dataset loading or processing. ### Solution Check and supply the missing required parameters according to the detailed error message. For example: 1. If the error is `mean must be provided`, when using the mooncake_trace dataset you must provide the `mean` parameter (via the trace's `input_length` field) for prompt generation. 2. If the error is `Either 'input_text' or 'input_length' must be provided`, in a **single JSONL record** of mooncake trace data, you must provide the `input_length` field when `input_text` is not provided. ## DSET-UNK-001 ### Error Description Unknown error of the dataset or a dependent component due to incorrect initialization. ### Solution If the error is **"RNG manager not initialized. Call init_rng() first."**, the mooncake_trace prompt generator called `derive_rng` before `init_rng` was called. Normal loading via `MooncakeTraceDataset.load()` calls `init_rng(random_seed)` automatically; this error usually occurs when calling lower-level APIs or in tests without proper initialization order. Ensure `init_rng(seed)` is called before any RNG-dependent generation logic. ## DSET-DEPENDENCY-002 ### Error Description Missing dependencies required for the dataset task evaluation. ### Solution If the error message is: ```bash Please install human_eval use following steps: git clone git@github.com:open-compass/human-eval.git cd human-eval && pip install -e . ``` Execute `git clone git@github.com:open-compass/human-eval.git` and `cd human-eval && pip install -e .` in sequence according to the error log content to install the `human-eval` library. ## DSET-MTRC-001 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## DSET-MTRC-003 ### Error Description No direct solution is available at this time. ### Solution If you need to resolve this issue, [please submit an issue](https://github.com/AISBench/benchmark/issues) and include this error code in the issue description. ## SWEB-DEPENDENCY-001 ### Error Description `mini-swe-agent` dependency is missing when running SWEBench infer, so task initialization fails. ### Solution Install the dependency and retry: ```bash pip install mini-swe-agent ``` If you use a virtual environment, make sure `ais_bench` and `mini-swe-agent` are installed in the same Python environment. ## SWEB-DEPENDENCY-002 ### Error Description SWE-bench harness dependency is missing when running SWEBench eval. ### Solution Install the harness from the official repository, then retry: ```bash git clone https://github.com/SWE-bench/SWE-bench.git cd SWE-bench pip install -e . ``` ## SWEB-PARAM-001 ### Error Description No valid model config is detected for SWEBench infer (required fields like `model/url/api_key` are missing or empty). ### Solution Check `models[0]` in your task config and provide at least: 1. `model`, for example `hosted_vllm/qwen3` 2. `url`, for example `http://127.0.0.1:2998/v1` 3. `api_key`, `EMPTY` is acceptable for local tests ## SWEB-PARAM-002 ### Error Description Invalid SWEBench dataset name that is not in the supported name set. ### Solution Set dataset `name` to one of: `full`, `verified`, `lite`, `multilingual`. ## SWEB-DATA-001 ### Error Description Prediction input contains `instance_id` values that do not exist in the current dataset. ### Solution Ensure the prediction file and eval dataset are fully aligned: 1. Run infer and eval with the same dataset config. 2. Check whether `instance_id` entries in predictions were manually changed or mixed from another run. ## SWEB-DATA-002 ### Error Description Failed to load SWEBench dataset from Hugging Face online source. ### Solution Check network connectivity and Hugging Face access first. If your environment is restricted, download parquet files manually and configure a local `path`. ## SWEB-DATA-003 ### Error Description Failed to read or parse local SWEBench parquet files. ### Solution Validate local data integrity and format: 1. Confirm files are valid parquet files. 2. Confirm file naming matches the target `split` (for example `test-*.parquet`). 3. Re-download or re-export corrupted files and retry. ## SWEB-FILE-001 ### Error Description Prediction file is missing during SWEBench eval (`*.json` or `preds.json` not found). ### Solution Run infer successfully before eval, and confirm prediction files exist under `work_dir/predictions` for the target model. ## SWEB-FILE-002 ### Error Description Local SWEBench dataset path resolution failed (path does not exist or is not accessible). ### Solution Check whether `path` in config is correct, and verify the current user has read permission for that directory/file. ## SWEB-FILE-003 ### Error Description No parquet file for the target split is found under the local dataset path. ### Solution Ensure one of the following exists: 1. `/data/-*.parquet` 2. `/-*.parquet` For single-file use cases, `path` can point directly to that parquet file. ## SWEB-RUNTIME-001 ### Error Description Required Docker image for SWEBench is unavailable locally and pulling failed. ### Solution Check Docker daemon status and network, then run `docker pull` for the image shown in logs. Retry after image is available. ## SWEB-RUNTIME-002 ### Error Description Runtime error occurs during SWEBench execution (for example harness execution failure or future task exception). ### Solution Use detailed logs to triage: 1. Check dependency installation, Docker availability, and prediction file format first. 2. If it persists, keep full logs and open an issue with the error code: .