ModelTRTLLMBuildConfiguration¶

class baseten.client.modelconfig.ModelTRTLLMBuildConfiguration(*, base_model=ModelTRTLLMModel.decoder, max_seq_len=None, max_batch_size=256, max_num_tokens=8192, max_beam_width=1, max_prompt_embedding_table_size=0, checkpoint_repository=None, gather_all_token_logits=False, strongly_typed=False, quantization_type=ModelTRTLLMQuantizationType.no_quant, quantization_config=<factory>, tensor_parallel_count=1, pipeline_parallel_count=1, moe_expert_parallel_option=-1, sequence_parallel_count=1, plugin_configuration=<factory>, num_builder_gpus=None, speculator=None, lora_adapters=None, lora_configuration=None, skip_build_result=False, **extra_data)¶

Bases: BaseModel

Parameters:

base_model (ModelTRTLLMModel | None)
max_seq_len (MaxSeqLen | None)
max_batch_size (Annotated[int | None, Ge(ge=1), Le(le=2048)])
max_num_tokens (Annotated[int | None, Gt(gt=64), Le(le=1048576)])
max_beam_width (Annotated[int | None, Ge(ge=1), Le(le=1)])
max_prompt_embedding_table_size (int | None)
checkpoint_repository (CheckpointRepository | None)
gather_all_token_logits (bool | None)
strongly_typed (bool | None)
quantization_type (ModelTRTLLMQuantizationType | None)
quantization_config (ModelTRTQuantizationConfiguration | None)
tensor_parallel_count (Annotated[int | None, Ge(ge=1)])
pipeline_parallel_count (int | None)
moe_expert_parallel_option (int | None)
sequence_parallel_count (int | None)
plugin_configuration (ModelTRTLLMPluginConfiguration | None)
num_builder_gpus (NumBuilderGpus | None)
speculator (ModelSpeculatorConfiguration | None)
lora_adapters (dict[Annotated[str, FieldInfo(annotation=NoneType, required=True, metadata=[_PydanticGeneralMetadata(pattern='^[a-zA-Z0-9_\\-\\.:]+$')])], CheckpointRepository] | None)
lora_configuration (ModelTRTLLMLoraConfiguration | None)
skip_build_result (bool | None)
extra_data (Any)

model_config = {'extra': 'allow'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].