TRTLLMRuntimeConfigurationV2¶
- class baseten.client.modelconfig.TRTLLMRuntimeConfigurationV2(*, max_seq_len=None, max_batch_size=256, max_num_tokens=8192, tensor_parallel_size=1, enable_chunked_prefill=True, served_model_name=None, patch_kwargs=None, **extra_data)¶
Bases:
BaseModel- Parameters:
max_seq_len (MaxSeqLen | None)
max_batch_size (Annotated[int | None, Ge(ge=1), Le(le=2048)])
max_num_tokens (Annotated[int | None, Gt(gt=64), Le(le=131072)])
enable_chunked_prefill (bool | None)
served_model_name (str | None)
patch_kwargs (dict[str, str | int | float | dict[str, Any] | list[Any] | None] | None)
extra_data (Any)
- model_config = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].