Request Sending Rate (RPS) Distribution Control and Visualization Guide
Background Introduction
In performance testing scenarios using the AISBench tool, fluctuation control of the request sending rate during the inference phase (when sending requests) is designed to simulate traffic fluctuations in real-world business scenarios, including bursty traffic and continuously growing traffic patterns. For usage instructions, please refer to the description in the File Configuration section.
Core Features
Bursty Traffic Simulation
Purpose: Dynamically control fluctuations in the request sending rate to replicate sudden traffic surges during business peak hours, and evaluate the system’s ability to respond to sudden requests.
Method: Adjust the distribution shape via the
burstinessparameter to generate request sending intervals using various mathematical distributions, such as uniform distribution, Poisson distribution, and Gamma distribution.
Continuously Growing Traffic Simulation
Purpose: Simulate scenarios where business request volumes increase steadily to test the system’s ability to handle gradual load escalation.
Method: Configure linear or exponential growth modes using the
ramp_up_strategy,ramp_up_start_rps, andramp_up_end_rpsparameters.
Visualization
Before sending requests: Display the expected distribution of the request sending rate.
After sending requests: Present the difference between the actual request sending rate distribution and the expected values, allowing users to intuitively understand traffic fluctuations and system performance.
🔍 Terminology Interpretation
RPS: Request Per Second, unit: \(requests/second\).
Burstiness: Peak periods of request traffic.
Ramp-up / Ramp-up Behavior: The process of the request sending rate increasing from a starting rate to an ending rate in a specific manner.
In formulas,
burstinessis abbreviated as \(\beta\).
Parameter Details
File Configuration
To enable the feature, configure the following parameters in the corresponding 🔗 Model API Configuration File:
models = [
dict(
... # Other parameters
request_rate = 100, # Existing parameter
traffic_cfg = dict( # Newly added `traffic_cfg` parameter
burstiness = 0.5,
ramp_up_strategy = "linear",
ramp_up_start_rps = 10,
ramp_up_end_rps = 200,
),
... # Other parameters
)
]
The values above are for reference only. When using the feature, configure parameters according to the
Parameter MeaningandParameter Rulesin the table in the Parameter Interpretation section.
Parameter Interpretation
For the definitions of “ramp-up” and “ramp-up behavior”, please refer to Background Introduction > Terminology Interpretation.
Parameter Name |
Parameter Meaning |
Processing Logic |
|---|---|---|
burstiness |
Burstiness factor (\(\beta\)), i.e., the shape parameter of the Gamma distribution. It collaborates with the three |
- If set to |
ramp_up_strategy |
The mode of RPS growth. |
- If set to |
ramp_up_start_rps |
The starting value of RPS during ramp-up. |
- If set to |
ramp_up_end_rps |
The ending value of RPS during ramp-up. |
- If set to |
Table Summary
Three parameters controlling ramp-up behavior:
-ramp_up_strategy(RPS growth mode:"linear"for linear growth or"exponential"for exponential growth),
-ramp_up_start_rps(starting RPS during growth),
-ramp_up_end_rps(ending RPS during growth).
- Parameter controlling burst fluctuation:
-burstiness(fluctuation factor for traffic burstiness: \(=0\) for no burst, \(0<burstiness<1\) for dense bursts (Gamma distribution), \(burstiness=1\) for Poisson-distributed burstiness, \(burstiness>1\) for more uniform burstiness (approximate uniform distribution)).
- Constraint Relationships:
- Ramp-up behavior is enabled only if both conditions are met; otherwise, it is disabled:
1.ramp_up_strategyis set to"linear"or"exponential".
2. \(ramp\_up\_start\_rps > 0\), \(ramp\_up\_end\_rps > 0.1\), and \(ramp\_up\_end\_rps ≥ ramp\_up\_start\_rps\).
- Target Rate:
- Definition: The final expected request sending rate.
- Terminology Details:
- “Final”: Refers to the desired sending rate to be achieved.
- “Expected”: The request sending rate here is a preset value (not the actual value) due to factors such as the burstiness factor, concurrency, and service processing rate. For theactual sending rate, refer to the description in 📚 Actual Request Sending Rate Chart.
- Judgment Rules:
- If ramp-up behavior is enabled,request_rateis no longer effective, andramp_up_end_rpsbecomes thetarget rate.
- If ramp-up behavior is disabled,request_rateserves as thetarget rate.
- When use_timestamp is True in the model config and the dataset contains timestamps, requests are scheduled by timestamp and request_rate and traffic_cfg do not participate in scheduling.
- Instantaneous Scenario: When thetarget rate < 0.1, all requests are sent at an instantaneous rate.
Visualization: {datasetname}_rps_distribution_plot.html
Overview
When the \(target rate ≥ 0.1\) (see Parameter Interpretation > Table Summary > Target Rate for the definition of target rate), this visualization file is saved to:
Performance Testing:
output/default/{timestamp}/performances/{model_api_name}/
Special Notes
This chart is generated before sending requests during the
inferphase, and all calculated data represents the expected request sending rate.In instantaneous scenarios (\(target rate < 0.1\)), the three charts in this file have no reference value, so this visualization file is not generated.
No visualization is provided for accuracy testing scenarios.
Detailed Chart Explanations
An example is shown below:

For operation methods, refer to 🔗 Basic Interactive Operations - View Control.
1. Time vs RPS Chart (Time vs RPS - Distribution)

X-axis: Time (seconds).
Y-axis: Requests Per Second (RPS).
Included Traces:

Normal RPS (blue line):
Judgment Logic: Request interval \(t_{\text{interval}} \geq 1\text{ms}\) and \(\frac{|t_{\text{actual}} - t_{\text{expected}}|}{t_{\text{expected}}} \leq 0.5\).
Expected Interval Calculation: \(t_{\text{expected}} = \frac{1}{\text{RPS}_{\text{current}}}\).
Data Source: Request intervals generated via Gamma distribution (when \(\beta > 0\)) or fixed intervals (when \(\beta = 0\)).
Time Interval Caused Anomaly (red triangles):
Judgment Logic: “Excessively short time intervals”, filtered by \(t_{\text{actual}} < 0.001\) seconds.
Cause: The
burstinessparameter leads to extremely short time intervals; the system cannot reliably handle intervals shorter than \(1\text{ms}\), resulting in an abnormally high request sending rate.Exclusion Condition: No such anomalies occur when \(\beta = 0\).
Priority: Higher than burstiness-induced anomalies. If both “excessively short time intervals” and “significant burstiness impact” conditions are met, this type of anomaly is marked first.
Special Handling: If there are more than 1,000 data points, CDF (Cumulative Distribution Function) density sampling is applied based on time to only show the trend.
Burstiness Caused Anomaly (yellow squares):
Judgment Logic: “Significant burstiness impact”, filtered by \(\frac{|t_{\text{actual}} - t_{\text{expected}}|}{t_{\text{expected}}} > 0.5\).
Cause: Caused by the
burstinessparameter; burst intervals generated by the Gamma distribution deviate significantly from the expected value.Exclusion Condition: No such anomalies occur when \(\beta = 0\).
Special Handling: If there are more than 1,000 data points, CDF density sampling is applied based on time to only show the trend.
N-point EWMA (purple line):
Calculation Principle: Exponentially Weighted Moving Average: \(\text{EWMA}_t = \alpha \cdot \text{RPS}_t + (1-\alpha) \cdot \text{EWMA}_{t-1}\), where \(\alpha = \frac{2}{N+1}\) and \(N\) is the adaptive window size (data threshold : window size:
1000:20,10000:50,100000:100,larger:200).Purpose: Calculate the weighted average of RPS values within the window size, and move the window to reduce noise, forming a denoised fitting line for
Normal RPS. Higher weights are assigned to recent RPS values to smooth the sequence and observe trends.
Theoretical Ramp-up (green dashed line):
Calculation Principle:
Linear ramp-up: \(\text{RPS} = R_{\text{start}} + (R_{\text{end}} - R_{\text{start}}) \times \text{progress}\)
Exponential ramp-up: \(\text{RPS} = R_{\text{start}} \times \left(\frac{R_{end}}{R_{start}}\right)^{progress}\)
Where \(\text{progress} = \frac{i}{N-1}\), \(i\) is the request index, and \(N\) is the total number of requests.
Prerequisite: Valid configuration of the three parameters related to ramp-up behavior (
ramp_up_strategy,ramp_up_start_rps,ramp_up_end_rps).
2. Classic RPS Distribution Chart (Expected: RPS vs Request Count - Distribution)

X-axis: RPS value.
Y-axis: Request count (number of requests falling into the RPS interval).
Included Traces:

Normal Request Count (green histogram):
Judgment Logic: Same as the normal points in the Time vs RPS Chart.
Special Handling: To demonstrate the effect of the
burstinessparameter, burst-induced anomalies are treated as normal values in this chart.
Time Interval Caused Anomaly (red triangles):
Judgment Logic: \(t_{\text{actual}} < 0.001\) seconds.
Distribution Feature: Typically appears in regions with extremely high RPS (since \(\text{RPS} = \frac{1}{t_{\text{interval}}}\), smaller \(t_{\text{interval}}\) leads to larger RPS).
3. Request Interval Distribution Chart (Expected: Gamma Distribution (burstiness: {burstiness}))

X-axis: Request interval (seconds).
Y-axis: Request count (number of requests falling into the interval).
Included Traces:

Normal Intervals (purple histogram):
Judgment Logic: \(t_{\text{interval}} \geq 0.001\) and \(\frac{|t_{\text{actual}} - t_{\text{expected}}|}{t_{\text{expected}}} \leq 0.5\).
Distribution Shape: Shows a Gamma distribution shape when \(\beta > 0\).
Special Handling: To demonstrate the effect of the
burstinessparameter, burst-induced anomalies are treated as normal values in this chart.
Time Interval Caused Anomaly (red triangles):
Judgment Logic: \(t_{\text{interval}} < 0.001\) seconds.
Position Feature: Concentrated on the far left of the X-axis (near 0).
4. Legend Explanation Table (Legend Explanation)

Content:
Detailed explanations of the meaning, calculation principle, judgment method, and visual representation of each legend item (trace) in the charts.
The table includes 6 columns: Group Name, Legend Item, Meaning, Calculation Principle, Judgment Method, and Visual Representation.
Purpose: Help users understand the meaning of each trace in the charts, ensuring accurate and consistent interpretation.
5. Legend

Content:
Trace controls for the legend are distinguished by columns.
The format of each legend item is:
Legend Style | Group Name | Legend Item
Function: Controls the display of traces in each chart (click a legend item to show/hide the corresponding trace), thereby improving the readability of the charts.
Detailed Explanation of Anomaly Points
The exclusion of
anomaly pointshere is solely to improve the readability of visualization charts, so anomalous data points are classified and processed accordingly.That is, during global request sending rate control, these anomalous values are still retained to simulate real-world burstiness.
Time Interval Anomalies (Red Triangles)
Core Condition: \(t_{\text{interval}} < 0.001\)
Occurrence Scenarios:
In high-RPS scenarios, extremely short intervals may occur even when requests are sent as scheduled.
Instantaneous high load may appear at the end of ramp-up behavior (especially for exponential ramp-up).
Multi-process scheduling conflicts cause request backlogs, leading to bursty sending of accumulated requests.
Processing Priority: Higher than burstiness-induced anomalies. If both types of anomalies overlap, the point is marked as a time interval anomaly first.
Burstiness-Induced Anomalies (Yellow Squares)
Core Condition: \(\frac{|t_{\text{actual}} - t_{\text{expected}}|}{t_{\text{expected}}} > 0.5\)
Generation Mechanism:
When \(\beta > 0\), interval times are generated via the Gamma distribution:
\(t_{\text{interval}} \sim \Gamma(\beta, \theta)\)
where the scale parameter \(\theta = \frac{1}{\lambda \cdot \beta}\), and \(\lambda\) is the current request rate.
The \(\beta\) parameter controls the distribution shape:
\(\beta = 1\): Poisson distribution
\(\beta > 1\): Uniform distribution (bursty values may still occur, but the request sending rate is relatively more uniform)
\(0 < \beta < 1\): Gamma distribution (more extreme values)
Special Handling: When \(\beta = 0\), a fixed interval \(t_{\text{interval}} = \frac{1}{\lambda}\) is used, and no such anomalies occur. The default value of \(\beta\) is 0.
Calculation Formulas for Ramp-Up Behavior
Two Ramp-Up Modes
Linear Ramp-Up
\(\lambda_i = \lambda_{\text{start}} + (\lambda_{\text{end}} - \lambda_{\text{start}}) \times \frac{i}{N-1}\)
where \(i\) is the request index (ranging from 0 to N-1), and \(N\) is the total number of requests.
Exponential Ramp-Up
\(\lambda_i = \lambda_{\text{start}} \times \left(\frac{\lambda_{end}}{\lambda_{start}} \right)^{\frac{i}{N-1}}\)
Global RPS - Global Request Offset Sending Time
Time Offset Calculation:
The main process calculates the global cumulative delay:
\(t_{cumulative, i} = \sum_{k=0}^{i} t_{interval, k}\)
Each worker process determines the request sending time based on the offset.
Anomaly Detection:
Anomaly points are detected synchronously when generating time offsets:
\(timing_anomaly = \left\{i \mid t_{\text{interval},i} < 0.001\right\}\)
\(\text{burstiness_anomaly} = \left\{i \mid \frac{|t_{\text{actual}, i} - t_{\text{expected}, i}|}{t_{\text{expected}, i}} > 0.5\right\} \setminus \text{timing_anomaly}\)
\(\setminus\) denotes the set difference, meaning elements in the former set that are not present in the latter set are retained.
Normalization Processing (only when there is no ramp-up behavior and \(\beta = 0\)):
\(t_{\text{total}} = \frac{N}{\lambda}\)
\(k = \frac{t_{\text{total}}}{t_{\text{cumulative},N-1}}\)
\(t_{cumulative, i} \leftarrow k \times t_{cumulative, i}\)
Ensures the total time matches the expected request rate.
Corrects time deviations introduced by ramp-up behavior (note: normalization is not performed when ramp-up behavior is enabled).
Visualization: {datasetname}_rps_distribution_plot_with_actual_rps.html
No visualization is provided for accuracy testing scenarios.
Chart Example
Newly Added Part

Detailed Diagrams

For operation methods, refer to the link 🔗 Basic Interactive Operations - View Control
This chart is generated after the
inference completion/interruption and data waitingphase of theinferstage. It is saved to the same location as the expected save path of{datasetname}_rps_distribution_plot.html.
Chart Interpretation
If
{datasetname}_rps_distribution_plot.htmlis generated before sending requests: A new orange trace is added to the originalTime vs RPS - Distributionchart in this file, overlapping with the existing blue trace (expected normal RPS) (as shown in the red-marked part of the example above). You can toggle the display/hide of this trace via theActual RPS: After Excluding Anomalieslegend to intuitively compare the difference between the actual sending rate and the expected sending rate.If
{datasetname}_rps_distribution_plot.htmlis not generated before sending requests: Only theTime vs RPS - Distributionchart of the actual sent requests is displayed.
Data Processing
Data Source: The sending timestamp of each request is recorded when the request is actually sent; the data is obtained from the timestamp records collected after all requests are sent.
When drawing the trace, anomalous values are still filtered out according to the calculation logic of the
Time vs RPS - Distributionchart in{datasetname}_rps_distribution_plot.html.If there are more than 5,000 requests, CDF (Cumulative Distribution Function) density sampling is applied based on time to only highlight the difference in variation trends between the actual RPS and the original expected RPS.
Summary
Anomaly points are identified based on strict mathematical conditions (\(t_{\text{interval}} < 0.001\) or \(\frac{\Delta t}{t_{\text{expected}}} > 0.5\)).
The shape of the RPS distribution is jointly determined by
burstiness,ramp_up_strategy,ramp_up_start_rps, andramp_up_end_rps.The calculation of global time offsets in multi-process scheduling is expected (i.e., the content of
{datasetname}_rps_distribution_plot.html), while the actual sending rate (i.e., the newly added part of{datasetname}_rps_distribution_plot_with_actual_rps.html) may deviate from expectations due to factors such as concurrency, physical machine performance, service request processing efficiency, and multi-turn dialogue scenarios.In stress testing scenarios, the frequency of connection creation is controlled, but not the request sending rate (after each connection is created, requests are sent and responses are processed continuously without interruption).
In multi-turn dialogue scenarios, only the request distribution of the first turn is valid.
Configuration and Visualization Examples
burstiness
The following examples show the
Expected: Gamma Distributioncharts when the burstiness factorburstinessis configured with different values.To more intuitively demonstrate the differences in request sending rate fluctuations caused by the burstiness factor, ramp-up behavior is disabled (i.e., no
ramp_up_*parameters are configured) in all examples.
burstiness = 0Parameter Configuration:
request_rate = 100, traffic_cfg = dict( burstiness = 0, ),
Gamma Distribution Chart:

burstiness = 0.5Parameter Configuration:
request_rate = 100, traffic_cfg = dict( burstiness = 0.5, ),
Gamma Distribution Chart:

burstiness = 1Parameter Configuration:
request_rate = 100, traffic_cfg = dict( burstiness = 1, ),
Gamma Distribution Chart:

burstiness = 10Parameter Configuration:
request_rate = 100, traffic_cfg = dict( burstiness = 10, ),
Gamma Distribution Chart:
