Ë §Ùbiƒãó@—ddlmZmZddlmZeGd„de««Zy)é)Ú dataclassÚfield)ÚOnlineDPOConfigcóN‡—eZdZUdZed„ddi¬«Zeeed<ˆfd„Z ˆxZ S)Ú XPOConfiga£ Configuration class for the [`XPOTrainer`]. Subclass of [`OnlineDPOConfig`] we can use all its arguments and add the following: Parameters: alpha (`float` or `list[float]`, *optional*, defaults to `1e-5`): Weight of the XPO loss term. If a list of floats is provided then the alpha is selected for each new epoch and the last alpha is used for the rest of the epochs. có—dgS)Ngñhãˆµøä>©r óúQ/home/cdr/jupyterlab/.venv/lib/python3.12/site-packages/trl/trainer/xpo_config.pyúzXPOConfig."s€ €r Úhelpz¡Weight of the XPO loss term. If a list of floats is provided then the alpha is selected for each new epoch and the last alpha is used for the rest of the epochs.)Údefault_factoryÚmetadataÚalphacóª•—t‰|«t|jd«r.t |j«dk(r|jd|_yyy)NÚ__len__ér)ÚsuperÚ __post_init__ÚhasattrrÚlen)ÚselfÚ __class__s €rrzXPOConfig.__post_init__)sCø€Ü ‰ÑÔÜ4—:‘:˜yÔ)¬c°$·*±*«oÀÒ.BØŸ™ A™ˆDJð/CÐ)r )Ú__name__Ú __module__Ú__qualname__Ú__doc__rrÚlistÚfloatÚ__annotations__rÚ __classcell__)rs@rrrs:ø…ñ ñÙ&àðOð ô€Eˆ4‰;ó÷'ð'r rN)ÚdataclassesrrÚtrl.trainer.online_dpo_configrrr r rúr$s(ð÷)å9ðô'ó'óñ'r