ModelTRTLLMRuntimeConfiguration¶
- class baseten.client.modelconfig.ModelTRTLLMRuntimeConfiguration(*, kv_cache_free_gpu_mem_fraction=0.9, kv_cache_host_memory_bytes=None, enable_chunked_context=True, batch_scheduler_policy=ModelTRTLLMBatchSchedulerPolicy.guaranteed_no_evict, request_default_max_tokens=None, served_model_name=None, total_token_limit=500000, webserver_default_route=None, **extra_data)¶
Bases:
BaseModel- Parameters:
kv_cache_free_gpu_mem_fraction (float | None)
kv_cache_host_memory_bytes (KvCacheHostMemoryBytes | None)
enable_chunked_context (bool | None)
batch_scheduler_policy (ModelTRTLLMBatchSchedulerPolicy | None)
request_default_max_tokens (RequestDefaultMaxTokens | None)
served_model_name (str | None)
total_token_limit (int | None)
webserver_default_route (WebserverDefaultRoute | None)
extra_data (Any)
- model_config = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].