# Explanation of Running Modes
## Accuracy Evaluation Scenarios
### All Mode
In **All Mode**, the evaluation tool executes the complete workflow of **Inference → Evaluation → Summary**:
```mermaid
graph LR;
  A[Execute inference based on the given dataset] --> B((Inference Results))
  B --> C[Perform evaluation based on inference results]
  C --> D((Accuracy Data))
  D --> E[Generate a summary report based on accuracy data]
  E --> F((Display Results))
```
Command Example:
```shell
ais_bench --models vllm_api_general --datasets gsm8k_gen --mode all
```
Generated Directory Structure:
```bash
outputs/default/
├── 20250220_120000/        # Each experiment corresponds to a timestamp folder
├── 20250220_183030/
│   ├── configs/            # Dumped configuration files (may include configs for multiple experiments)
│   ├── logs/
│   │   ├── eval/           # Logs of the evaluation phase
│   │   └── infer/          # Logs of the inference phase
│   ├── predictions/        # Inference result data
│   ├── results/            # Evaluation results for each task
│   └── summary/            # Summary report of a single experiment
└── ...

```

### Infer Mode
In **Infer Mode**, only the inference phase is executed, and the output results are saved:
```mermaid
graph LR;
  A[Execute inference based on the given dataset] --> B((Inference Results))
```
Command Example:
```shell
ais_bench --models vllm_api_general --datasets gsm8k_gen --mode infer
```
Generated Directory Structure:
```bash
outputs/default/
├── 20250220_120000/
├── 20250220_183030/
│   ├── configs/
│   ├── logs/
│   │   └── infer/
│   └── predictions/        # Contains only inference results
└── ...
```

### Eval Mode
In **Eval Mode**, evaluation and report generation are performed based on existing inference results. The `--reuse` parameter is required:
```mermaid
graph LR;
  B((Inference Results)) --> C[Perform evaluation based on inference results]
  C --> D((Accuracy Data))
  D --> E[Generate a summary report based on accuracy data]
  E --> F((Display Results))
```
Command Example:
```shell
ais_bench --models vllm_api_general --datasets gsm8k_gen --mode eval --reuse
```
Generated Directory Structure:
```bash
outputs/default/
├── 20250220_120000/
├── 20250220_183030/
│   ├── configs/
│   ├── logs/
│   │   ├── eval/           # Newly added eval logs
│   │   └── infer/
│   ├── predictions/
│   └── results/            # Newly added evaluation result files
└── ...
```

### Viz Mode
In **Viz Mode**, only a summary report is generated and displayed based on existing accuracy data. The `--reuse` parameter is also required:
```mermaid
graph LR;
  D((Accuracy Data)) --> E[Generate a summary report based on accuracy data]
  E --> F((Display Results))
```
Command Example:
```shell
ais_bench --models vllm_api_general --datasets gsm8k_gen --mode viz --reuse
```

Generated Directory Structure:
```bash
outputs/default/
├── 20250220_120000/
├── 20250220_183030/
│   ├── configs/
│   ├── logs/
│   │   ├── eval/
│   │   └── infer/
│   ├── predictions/
│   ├── results/
│   └── summary/            # Newly added summary report (output of viz mode)
└── ...
```


## Performance Evaluation Scenarios
### Perf Mode
In **Perf Mode**, the evaluation tool executes the complete workflow of **Performance Sampling → Calculation → Summary** and generates a [visualization report](../results_intro/performance_visualization.md):
```mermaid
graph LR;
  A[Execute inference based on the given dataset] --> B((Performance Sampling Data))
  B --> C[Calculate metrics based on sampling data]
  C --> D((Performance Data))
  D --> E[Generate a summary report based on performance data]
  E --> F((Display Results))
```
> ⚠️ **Note**: In the performance evaluation scenario, `--models` only supports streaming service-oriented inference APIs (refer to [Service-Oriented Inference Backend](./models.md#service-oriented-inference-backend)), such as [`vllm_api_general_stream`](https://github.com/AISBench/benchmark/tree/master/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_stream.py).

Command Example:
```shell
ais_bench --models vllm_api_general_stream --datasets synthetic_gen --mode perf
```

Example of Generated Directory Structure:
```bash
outputs/default/
├── 20200220_120000/
├── 20230220_183030/
│   ├── configs/
│   ├── logs/
│   │   └── performance/          # Performance evaluation logs
│   └── performance/              # Performance evaluation results
│       └── vllm-api-general-stream/
│           ├── syntheticdataset.csv        # Performance data of single inference requests
│           ├── syntheticdataset.json       # End-to-end performance data
│           ├── syntheticdataset_details.h5  # Full sampling ITL (Inter-Token Latency) data
│           ├── syntheticdataset_details.json  # Detailed full sampling data
│           └── syntheticdataset_plot.html     # Real-time concurrency and request visualization page
└── ...

```
- Performance sampling is based on `syntheticdataset.csv` and `syntheticdataset.json`.

### Perf_Viz Mode
In **Perf_Viz Mode**, only a summary report is generated and displayed based on existing performance data. The `--reuse` parameter is required:
```mermaid
graph LR;
  D((Performance Data)) --> E[Generate a summary report based on performance data]
  E --> F((Display Results))
```
Command Example:
```shell
ais_bench --models vllm_api_general_stream --datasets synthetic_gen --mode perf_viz --reuse
```
> **Explanation**: `perf_viz` will read `syntheticdataset.csv` and `syntheticdataset.json` from the most recent experiment folder, and generate visualization results based on the introduction of performance metrics.

For reference on performance evaluation results: [Explanation of Performance Evaluation Results](../results_intro/performance_metric.md#explanation-of-performance-evaluation-results)