# Explanation of Performance Evaluation Results
The performance evaluation results include **performance output results for individual inference requests** and **end-to-end performance output results**. The parameter descriptions are as follows:


## 1. Performance Output Results for Individual Inference Requests
Explanations of key statistical indicators are as follows:
- **P75 / P90 / P99**: Taking TPOT as an example, these represent the performance of TPOT values at the 75th, 90th, and 99th percentiles across all requests, respectively.
- **E2EL (End-to-End Latency)**: The total latency of a single request from sending the request to receiving the complete response.
- **TTFT (Time To First Token)**: The latency for the first token to be returned.
- **TPOT (Time Per Output Token)**: The average generation latency per token during the output phase (excluding the first token).
- **ITL (Inter-token Latency)**: The average interval latency between adjacent tokens (excluding the first token).
- **InputTokens**: The number of input tokens in the request.
- **OutputTokens**: The number of output tokens generated by the request.
- **OutputTokenThroughput**: The throughput of output tokens (in tokens per second, Token/s).
- **Tokenizer**: The time consumed for Tokenizer encoding.
- **Detokenizer**: The time consumed for Detokenizer decoding.

| Performance Parameters       | Stage                          | Average                          | Max                              | Min                              | Median                          | P75                              | P90                              | P99                              | N                                |
| ----------------------------- | ------------------------------ | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- |
| E2EL                          | Stage for this parameter       | Average request latency          | Maximum request latency          | Minimum request latency          | Median request latency          | 75th-percentile request latency  | 90th-percentile request latency  | 99th-percentile request latency  | Test data volume (from input parameters) |
| TTFT                          | Stage for this parameter       | Average latency of first token   | Maximum latency of first token   | Minimum latency of first token   | Median latency of first token   | 75th-percentile latency of first token | 90th-percentile latency of first token | 99th-percentile latency of first token | Test data volume (from input parameters) |
| TPOT                          | Stage for this parameter       | Average latency of Decode stage  | Maximum latency of Decode stage  | Minimum latency of Decode stage  | Median latency of Decode stage  | 75th-percentile latency of Decode stage | 90th-percentile average latency of Decode stage per request | 99th-percentile latency of Decode stage | Test data volume (from input parameters) |
| ITL                           | Stage for this parameter       | Average inter-token latency      | Maximum inter-token latency      | Minimum inter-token latency      | Median inter-token latency      | 75th-percentile inter-token latency | 90th-percentile inter-token latency | 99th-percentile inter-token latency | Test data volume (from input parameters) |
| InputTokens                   | Stage for this parameter       | Average length of input tokens   | Maximum length of input tokens   | Minimum length of input tokens   | Median length of input tokens   | 75th-percentile length of input tokens | 90th-percentile length of input tokens | 99th-percentile length of input tokens | Test data volume (from input parameters) |
| OutputTokens                  | Stage for this parameter       | Average length of output tokens  | Maximum length of output tokens  | Minimum length of output tokens  | Median length of output tokens  | 75th-percentile length of output tokens | 90th-percentile length of output tokens | 99th-percentile length of output tokens | Test data volume (from input parameters) |
| OutputTokenThroughput         | Stage for this parameter       | Average output throughput        | Maximum output throughput        | Minimum output throughput        | Median output throughput        | 75th-percentile output throughput | 90th-percentile output throughput | 99th-percentile output throughput | Test data volume (from input parameters) |


## 2. End-to-End Performance Output Results
| Parameter                     | Description                                                                 |
| ----------------------------- | --------------------------------------------------------------------------- |
| **Benchmark Duration**        | Total execution time of the test task                                       |
| **Total Requests**            | Total number of requests                                                   |
| **Failed Requests**           | Number of failed requests (including unresponsive requests or empty responses) |
| **Success Requests**          | Number of successfully returned requests (including empty and non-empty responses) |
| **Concurrency**               | Actual average concurrency                                                  |
| **Max Concurrency**           | Configured maximum concurrency                                              |
| **Request Throughput**        | Request-level throughput (requests per second, Requests/s)                  |
| **Total Input Tokens**        | Total number of input tokens across all requests                            |
| **Prefill Token Throughput**  | Token throughput during the Prefill stage (Token/s)                         |
| **Total Output Tokens**       | Total number of output tokens generated across all requests                 |
| **Input Token Throughput**    | Input token throughput (Token/s)                                            |
| **Output Token Throughput**   | Output token throughput (Token/s)                                           |
| **Total Token Throughput**    | Total token throughput (input + output) (Token/s)                           |