# Explanation of Performance Evaluation Results The performance evaluation results include **performance output results for individual inference requests** and **end-to-end performance output results**. The parameter descriptions are as follows: ## 1. Performance Output Results for Individual Inference Requests Explanations of key statistical indicators are as follows: - **P75 / P90 / P99**: Taking TPOT as an example, these represent the performance of TPOT values at the 75th, 90th, and 99th percentiles across all requests, respectively. - **E2EL (End-to-End Latency)**: The total latency of a single request from sending the request to receiving the complete response. - **TTFT (Time To First Token)**: The latency for the first token to be returned. - **TPOT (Time Per Output Token)**: The average generation latency per token during the output phase (excluding the first token). - **ITL (Inter-token Latency)**: The average interval latency between adjacent tokens (excluding the first token). - **InputTokens**: The number of input tokens in the request. - **OutputTokens**: The number of output tokens generated by the request. - **OutputTokenThroughput**: The throughput of output tokens (in tokens per second, Token/s). - **Tokenizer**: The time consumed for Tokenizer encoding. - **Detokenizer**: The time consumed for Detokenizer decoding. | Performance Parameters | Stage | Average | Max | Min | Median | P75 | P90 | P99 | N | | ----------------------------- | ------------------------------ | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | -------------------------------- | | E2EL | Stage for this parameter | Average request latency | Maximum request latency | Minimum request latency | Median request latency | 75th-percentile request latency | 90th-percentile request latency | 99th-percentile request latency | Test data volume (from input parameters) | | TTFT | Stage for this parameter | Average latency of first token | Maximum latency of first token | Minimum latency of first token | Median latency of first token | 75th-percentile latency of first token | 90th-percentile latency of first token | 99th-percentile latency of first token | Test data volume (from input parameters) | | TPOT | Stage for this parameter | Average latency of Decode stage | Maximum latency of Decode stage | Minimum latency of Decode stage | Median latency of Decode stage | 75th-percentile latency of Decode stage | 90th-percentile average latency of Decode stage per request | 99th-percentile latency of Decode stage | Test data volume (from input parameters) | | ITL | Stage for this parameter | Average inter-token latency | Maximum inter-token latency | Minimum inter-token latency | Median inter-token latency | 75th-percentile inter-token latency | 90th-percentile inter-token latency | 99th-percentile inter-token latency | Test data volume (from input parameters) | | InputTokens | Stage for this parameter | Average length of input tokens | Maximum length of input tokens | Minimum length of input tokens | Median length of input tokens | 75th-percentile length of input tokens | 90th-percentile length of input tokens | 99th-percentile length of input tokens | Test data volume (from input parameters) | | OutputTokens | Stage for this parameter | Average length of output tokens | Maximum length of output tokens | Minimum length of output tokens | Median length of output tokens | 75th-percentile length of output tokens | 90th-percentile length of output tokens | 99th-percentile length of output tokens | Test data volume (from input parameters) | | OutputTokenThroughput | Stage for this parameter | Average output throughput | Maximum output throughput | Minimum output throughput | Median output throughput | 75th-percentile output throughput | 90th-percentile output throughput | 99th-percentile output throughput | Test data volume (from input parameters) | ## 2. End-to-End Performance Output Results | Parameter | Description | | ----------------------------- | --------------------------------------------------------------------------- | | **Benchmark Duration** | Total execution time of the test task | | **Total Requests** | Total number of requests | | **Failed Requests** | Number of failed requests (including unresponsive requests or empty responses) | | **Success Requests** | Number of successfully returned requests (including empty and non-empty responses) | | **Concurrency** | Actual average concurrency | | **Max Concurrency** | Configured maximum concurrency | | **Request Throughput** | Request-level throughput (requests per second, Requests/s) | | **Total Input Tokens** | Total number of input tokens across all requests | | **Prefill Token Throughput** | Token throughput during the Prefill stage (Token/s) | | **Total Output Tokens** | Total number of output tokens generated across all requests | | **Input Token Throughput** | Input token throughput (Token/s) | | **Output Token Throughput** | Output token throughput (Token/s) | | **Total Token Throughput** | Total token throughput (input + output) (Token/s) |