Explanation of Performance Evaluation Results

The performance evaluation results include performance output results for individual inference requests and end-to-end performance output results. The parameter descriptions are as follows:

1. Performance Output Results for Individual Inference Requests

Explanations of key statistical indicators are as follows:

P75 / P90 / P99: Taking TPOT as an example, these represent the performance of TPOT values at the 75th, 90th, and 99th percentiles across all requests, respectively.
E2EL (End-to-End Latency): The total latency of a single request from sending the request to receiving the complete response.
TTFT (Time To First Token): The latency for the first token to be returned.
TPOT (Time Per Output Token): The average generation latency per token during the output phase (excluding the first token).
ITL (Inter-token Latency): The average interval latency between adjacent tokens (excluding the first token).
InputTokens: The number of input tokens in the request.
OutputTokens: The number of output tokens generated by the request.
OutputTokenThroughput: The throughput of output tokens (in tokens per second, Token/s).
Tokenizer: The time consumed for Tokenizer encoding.
Detokenizer: The time consumed for Detokenizer decoding.

Performance Parameters	Stage	Average	Max	Min	Median	P75	P90	P99	N
E2EL	Stage for this parameter	Average request latency	Maximum request latency	Minimum request latency	Median request latency	75th-percentile request latency	90th-percentile request latency	99th-percentile request latency	Test data volume (from input parameters)
TTFT	Stage for this parameter	Average latency of first token	Maximum latency of first token	Minimum latency of first token	Median latency of first token	75th-percentile latency of first token	90th-percentile latency of first token	99th-percentile latency of first token	Test data volume (from input parameters)
TPOT	Stage for this parameter	Average latency of Decode stage	Maximum latency of Decode stage	Minimum latency of Decode stage	Median latency of Decode stage	75th-percentile latency of Decode stage	90th-percentile average latency of Decode stage per request	99th-percentile latency of Decode stage	Test data volume (from input parameters)
ITL	Stage for this parameter	Average inter-token latency	Maximum inter-token latency	Minimum inter-token latency	Median inter-token latency	75th-percentile inter-token latency	90th-percentile inter-token latency	99th-percentile inter-token latency	Test data volume (from input parameters)
InputTokens	Stage for this parameter	Average length of input tokens	Maximum length of input tokens	Minimum length of input tokens	Median length of input tokens	75th-percentile length of input tokens	90th-percentile length of input tokens	99th-percentile length of input tokens	Test data volume (from input parameters)
OutputTokens	Stage for this parameter	Average length of output tokens	Maximum length of output tokens	Minimum length of output tokens	Median length of output tokens	75th-percentile length of output tokens	90th-percentile length of output tokens	99th-percentile length of output tokens	Test data volume (from input parameters)
OutputTokenThroughput	Stage for this parameter	Average output throughput	Maximum output throughput	Minimum output throughput	Median output throughput	75th-percentile output throughput	90th-percentile output throughput	99th-percentile output throughput	Test data volume (from input parameters)

2. End-to-End Performance Output Results

Parameter	Description
Benchmark Duration	Total execution time of the test task
Total Requests	Total number of requests
Failed Requests	Number of failed requests (including unresponsive requests or empty responses)
Success Requests	Number of successfully returned requests (including empty and non-empty responses)
Concurrency	Actual average concurrency
Max Concurrency	Configured maximum concurrency
Request Throughput	Request-level throughput (requests per second, Requests/s)
Total Input Tokens	Total number of input tokens across all requests
Prefill Token Throughput	Token throughput during the Prefill stage (Token/s)
Total Output Tokens	Total number of output tokens generated across all requests
Input Token Throughput	Input token throughput (Token/s)
Output Token Throughput	Output token throughput (Token/s)
Total Token Throughput	Total token throughput (input + output) (Token/s)