h2oai · pascal-pfeiffer · Feb 1, 2024 · Jan 30, 2024 · Jan 30, 2024 · Jan 30, 2024
@@ -53,6 +53,7 @@ Using CLI for fine-tuning LLMs:
 
 ## What's New
 
+- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/592) Starting to deprecate RLHF in favor of DPO/IPO optimization. Training is no disabled, but old experiments are still viewable. RLHF will be fully removed in a future release.
 - [PR 530](https://github.com/h2oai/h2o-llmstudio/pull/530) Introduced a new problem type for DPO/IPO optimization. This optimization technique can be used as an alternative to RLHF.
 - [PR 288](https://github.com/h2oai/h2o-llmstudio/pull/288) Introduced Deepspeed for sharded training allowing to train larger models on machines with multiple GPUs. Requires NVLink. This feature replaces FSDP and offers more flexibility. Deepspeed requires a system installation of cudatoolkit and we recommend using version 11.8. See [Recommended Install](#recommended-install).
 - [PR 449](https://github.com/h2oai/h2o-llmstudio/pull/449) New problem type for Causal Classification Modeling allows to train binary and multiclass models using LLMs.

@@ -4,8 +4,6 @@ Defines the problem type of the experiment, which also defines the settings H2O
 
 - DPO Modeling: Used to fine-tune large language models using Direct Preference Optimization
 
-- Rlhf Language Modeling: Used to fine-tune RLHF language models
-
 - Sequence To Sequence Modeling: Used to fine-tune large sequence to sequence models
 
 - Causal Classification Modeling: Used to fine-tune causal classification models
@@ -60,7 +60,6 @@ def get_size(x):
     "problem_types": [
         "text_causal_language_modeling_config",
         "text_dpo_modeling_config",
-        "text_rlhf_language_modeling_config",
         "text_sequence_to_sequence_modeling_config",
         "text_causal_classification_modeling_config",
     ],