Ë §ÙbiãóL—ddlmZmZddlmZddlmZeGd„de««Zy)é)Ú dataclassÚfield)ÚOptional)ÚTrainingArgumentscó¸‡—eZdZUdZedddi¬«Zeed<edddi¬«Zeed <ed ddi¬«Z e eed<ed ddi¬«Zeed<edddi¬«Z e eed<edddi¬«Ze eed<ed ddi¬«Ze eed<ed ddi¬«Zeed<edddi¬«Zeed<edddi¬«Zeed<ed dd i¬«Ze eed!<ˆfd"„ZˆxZS)#Ú PRMConfiga1 Configuration class for the [`PRMTrainer`]. This class includes only the parameters that are specific to PRM training. For a full list of training arguments, please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may differ from those in [`~transformers.TrainingArguments`]. Using [`~transformers.HfArgumentParser`] we can turn this class into [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the command line. Parameters: max_length (`int` or `None`, *optional*, defaults to `1024`): Maximum length of the sequences (prompt + completion) used for truncation. max_prompt_length (`int` or `None`, *optional*, defaults to `512`): Maximum length of the prompt used for truncation. max_completion_length (`int` or `None`, *optional*, defaults to `None`): Maximum length of the completion used for truncation. The completion is the concatenation of the steps. disable_dropout (`bool`, *optional*, defaults to `True`): Whether to disable dropout in the model. step_separator (`str`, *optional*, defaults to `"\n"`): Separator used to separate each step of the reasoning process. train_on_last_step_only (`bool`, *optional*, defaults to `False`): Whether to train only on the last step. dataset_num_proc (`int`, *optional*, defaults to `None`): Number of processes to use for processing the dataset. gñhãˆµøä>Úhelpz$The initial learning rate for AdamW.)ÚdefaultÚmetadataÚ learning_rateé z•Log every X updates steps. Should be an integer or a float in range `[0,1)`. If smaller than 1, will be interpreted as ratio of total training steps.Ú logging_stepsNzÑWhether to use bf16 (mixed) precision instead of 32-bit. Requires Ampere or higher NVIDIA architecture or Intel XPU or using CPU (use_cpu) or Ascend NPU. If not set, it defaults to `True` if `fp16` is not set.Úbf16TzÖWhether or not to average tokens across devices. If enabled, will use all_reduce to synchronize num_tokens_in_batch for precise loss calculation. Reference: https://github.com/huggingface/transformers/issues/34242 Úaverage_tokens_across_devicesizJMaximum length of the sequences (prompt + completion) used for truncation.Ú max_lengthiz1Maximum length of the prompt used for truncation.Úmax_prompt_lengthzgMaximum length of the completion used for truncation. The completion is the concatenation of the steps.Úmax_completion_lengthzSeparator used to separate each step of the reasoning process.Ústep_separatorFz'Whether to train only on the last step.Útrain_on_last_step_onlyz6Number of processes to use for processing the dataset.Údataset_num_proccóv•—|j€ |jn|j|_t‰| «y)N)rÚfp16ÚsuperÚ __post_init__)ÚselfÚ __class__s €úQ/home/cdr/jupyterlab/.venv/lib/python3.12/site-packages/trl/trainer/prm_config.pyrzPRMConfig.__post_init__os*ø€Ø'+§y¡yÐ'8˜Ÿ™‘O¸d¿i¹iˆŒ ä ‰ÑÕó)Ú__name__Ú __module__Ú__qualname__Ú__doc__rrÚfloatÚ__annotations__rrrÚboolrrÚintrrrrÚstrrrrÚ __classcell__)rs@rrrs’ø…ññ:!ØØÐ@ÐAô€M5óñ!ØàðDð ô€M5óñ!Øàð!ð ô€Dˆ(4‰.óñ+0ØàðEð ô+Ð! 4óñ!&ØØÐfÐgô!€J˜‘ óñ(-ØØÐMÐNô(Ðx ‘}óñ,1Øàðð ô,Ð˜8 C™=óñ"ØØÐXÐYô€OTóñ ØØÐZÐ[ô€NCóñ%*ØØÐCÐDô%Ð˜Tóñ',ØØÐRÐSô'Ðh˜s‘mó÷ ð r rN)ÚdataclassesrrÚtypingrÚtransformersrr©r rúr/s/ð÷)Ýå*ðô\ Ð!ó\ óñ\ r