Traffic Scenarios
What are scenarios?¶
Scenarios describe how genai-bench should shape requests during a benchmark run. They define token distributions (for text), per-document token budgets (for embeddings), re-rank budgets (for rerank), or modality properties (for images). Scenarios are parsed from concise strings and used by samplers to construct inputs and expected output lengths.
Scenarios are optional. If you don’t provide any and you supply a dataset, genai-bench runs in dataset mode, sampling raw entries from your dataset without token shaping.
How scenarios are used¶
- The CLI accepts one or more scenarios via
--traffic-scenario. Each run iterates over the supplied scenarios and the selected iteration parameter (concurrency or batch size). - Internally, each scenario string is parsed into a Scenario class and passed to samplers to control request construction.
- Scenarios are defined as multi-value options in click. Meaning you can pass this command multiple times to benchmark different loads.
Scenario types and formats¶
-
Text distributions
- Deterministic:
D(num_input_tokens,num_output_tokens)- Example:
D(100,1000)
- Example:
- Normal:
N(mean_input_tokens,stddev_input_tokens)/(mean_output_tokens,stddev_output_tokens)- Example:
N(480,240)/(300,150)
- Example:
- Uniform:
U(min_input_tokens,max_input_tokens)/(min_output_tokens,max_output_tokens)orU(max_input_tokens,max_output_tokens)- Examples:
U(50,100)/(200,250)orU(100,200)
- Examples:
- Prefix Repetition (for KV cache benchmarking):
P(prefix_len,suffix_len)/output_len- Example:
P(2000,500)/200 - All requests share the same prefix (first request caches it, subsequent requests reuse cached KV)
- Each request has a unique suffix to ensure different completions
- Useful for benchmarking automatic prefix caching (APC), chunked prefill, and TTFT improvements
- Example:
- Deterministic:
-
Embeddings
- Embedding:
E(tokens_per_document) - Example:
E(1024)
- Embedding:
-
Re-Rank
- ReRank:
R(tokens_per_document,tokens_per_query) - Example:
R(1024,100)
- ReRank:
-
Image multi-modality
- Image:
I(width,height)orI(width,height,num_images) - Examples:
I(512,512),I(2048,2048,2)
- Image:
-
Special (no token shaping)
- Dataset scenario:
dataset - Use raw dataset entries without token shaping. See details below.
- Dataset scenario:
Notes
- Scenario strings are validated. If you mistype a scenario, you’ll get an error that shows the supported scenario types and the expected format for the type you chose.
Dataset mode (scenarios are optional)¶
Scenarios are optional when a dataset is provided. If you don’t pass --traffic-scenario, genai-bench uses dataset mode automatically (equivalent to --traffic-scenario dataset).
Behavior by task in dataset mode:
- text-to-text
- Picks a single line from your dataset as the prompt
- Sets
max_tokens=Noneandignore_eos=Falseto avoid token shaping
- text-to-embeddings
- Samples
batch_sizelines from your dataset per request - No token shaping is applied
- Samples
- text-to-rerank
- Samples a query and
batch_sizedocuments from your dataset - No token shaping is applied
- Samples a query and
- image-text-to-text and image-to-embeddings
- Samples images from your dataset (defaults to 1 image per request in dataset mode)
Tip
- If you prefer to be explicit, you can pass
--traffic-scenario dataseteven when providing a dataset.
Examples¶
Run using dataset-only mode (no token shaping):
genai-bench \
--task text-to-text \
--dataset-path data.csv \
--api-backend openai \
--api-base http://localhost:8000 \
--api-model-name gpt-test \
--model-tokenizer gpt2
Explicit dataset sentinel:
genai-bench \
--task image-text-to-text \
--dataset-path images.json \
--traffic-scenario dataset \
--api-backend openai \
--api-base http://localhost:8000 \
--api-model-name gpt-test \
--model-tokenizer gpt2
Scenario-based runs: