User Guide¶

This guide covers everything you need to know to effectively use GenAI Bench for benchmarking LLM endpoints.

What You'll Learn¶

Running Benchmarks

Learn how to run benchmarks against various LLM endpoints

Run Benchmark
Multi-Cloud Setup

Configure authentication for AWS, Azure, GCP, OCI, Baseten, and more

Multi-Cloud Guide
Baseten Support

Learn about dual format support and flexible response handling

Baseten Guide
Docker Usage Guide

Run GenAI Bench in containerized environments

Docker Usage Guide
Excel Reports

Generate comprehensive Excel reports from benchmark results

Excel Reports
Performance Troubleshooting

Debug unexpected benchmark results and common pitfalls

Troubleshooting Guide

Common Workflows¶

Basic Benchmarking¶

Choose your model provider - OpenAI, AWS Bedrock, Azure OpenAI, etc.
Configure authentication - API keys, IAM roles, or service accounts
Run the benchmark - Specify task type and parameters
Analyze results - View real-time dashboard or generate reports

Cross-Cloud Benchmarking¶

Benchmark models from one provider while storing results in another:

# Benchmark OpenAI, store in AWS S3
genai-bench benchmark \
  --api-backend openai \
  --api-key $OPENAI_KEY \
  --upload-results \
  --storage-provider aws \
  --storage-bucket my-results

Support for text, embeddings, and vision tasks:

text-to-text - Chat and completion tasks
text-to-embeddings - Embedding generation
image-text-to-text - Vision-language tasks
text-to-rerank - Document reranking

Need Help?¶

See Troubleshooting for common issues