Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Using CLI for fine-tuning LLMs:

## What's New

- [PR 747](https://github.com/h2oai/h2o-llmstudio/pull/747) Fully removed RLHF in favor of DPO/IPO/KTO optimization.
- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/599) Added `KTOPairLoss` for DPO modeling allowing to train models with simple preference data. Data currently needs to be manually prepared by randomly matching positive and negative examples as pairs.
- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/592) Starting to deprecate RLHF in favor of DPO/IPO optimization. Training is disabled, but old experiments are still viewable. RLHF will be fully removed in a future release.
- [PR 530](https://github.com/h2oai/h2o-llmstudio/pull/530) Introduced a new problem type for DPO/IPO optimization. This optimization technique can be used as an alternative to RLHF.
Expand Down
79 changes: 0 additions & 79 deletions documentation/docs/guide/experiments/experiment-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ import DSanswerColumn from '../../tooltips/experiments/_answer-column.mdx';
import DSparentIdColumn from '../../tooltips/experiments/_parent-id-column.mdx';
import DStextPromptStart from '../../tooltips/experiments/_text-prompt-start.mdx';
import DStextAnswerSeparator from '../../tooltips/experiments/_text-answer-separator.mdx';
import DSadaptiveKlControl from '../../tooltips/experiments/_adaptive-kl-control.mdx';
import DSaddEosTokentoprompt from '../../tooltips/experiments/_add-eos-token-to-prompt.mdx';
import DSaddEosTokentoanswer from '../../tooltips/experiments/_add-eos-token-to-answer.mdx';
import DSmaskPromptlabels from '../../tooltips/experiments/_mask-prompt-labels.mdx';
Expand Down Expand Up @@ -53,20 +52,6 @@ import TSsavecheckpoint from '../../tooltips/experiments/_save-checkpoint.mdx';
import TSevaluationepochs from '../../tooltips/experiments/_evaluation-epochs.mdx';
import TSevaluationbeforetraining from '../../tooltips/experiments/_evaluate-before-training.mdx';
import TStrainvalidationdata from '../../tooltips/experiments/_train-validation-data.mdx';
import TSuseRHLF from '../../tooltips/experiments/_use-rlhf.mdx';
import TSrewardModel from '../../tooltips/experiments/_reward-model.mdx';
import TSinitialKlCoefficient from '../../tooltips/experiments/_initial-kl-coefficient.mdx';
import TSklTarget from '../../tooltips/experiments/_kl-target.mdx';
import TSklHorizon from '../../tooltips/experiments/_kl-horizon.mdx';
import TSadvantagesGamma from '../../tooltips/experiments/_advantages-gamma.mdx';
import TSadvantagesLambda from '../../tooltips/experiments/_advantages-lambda.mdx';
import TSppoClipPolicy from '../../tooltips/experiments/_ppo-clip-policy.mdx';
import TSppoClipValue from '../../tooltips/experiments/_ppo-clip-value.mdx';
import TSscalingFactorValueLoss from '../../tooltips/experiments/_scaling-factor-value-loss.mdx';
import TSppoEpochs from '../../tooltips/experiments/_ppo-epochs.mdx';
import TSppoBatchSize from '../../tooltips/experiments/_ppo-batch-size.mdx';
import TSppoGenerateTemp from '../../tooltips/experiments/_ppo-generate-temperature.mdx';
import TSoffloadRewardModel from '../../tooltips/experiments/_offload-reward-model.mdx';
import AStokenmaskprobability from '../../tooltips/experiments/_token-mask-probability.mdx';
import ASskipParentprobability from '../../tooltips/experiments/_skip-parent-probability.mdx';
import ASrandomparentprobability from '../../tooltips/experiments/_random-parent-probability.mdx';
Expand Down Expand Up @@ -174,10 +159,6 @@ The settings under each category are listed and described below.

<DStextAnswerSeparator/>

## Adaptive Kl control

<DSadaptiveKlControl/>

### Add EOS token to prompt

<DSaddEosTokentoprompt/>
Expand Down Expand Up @@ -328,66 +309,6 @@ The settings under each category are listed and described below.

<TStrainvalidationdata/>

### Use RLHF

<TSuseRHLF/>

### Reward model

<TSrewardModel/>

### Adaptive KL control

<DSadaptiveKlControl/>

### Initial KL coefficient

<TSinitialKlCoefficient/>

### KL target

<TSklTarget/>

### KL Horizon

<TSklHorizon/>

### Advantages gamma

<TSadvantagesGamma/>

### Advantages Lambda

<TSadvantagesLambda/>

### PPO clip policy

<TSppoClipPolicy/>

### PPO clip value

<TSppoClipValue/>

### Scaling factor value loss

<TSscalingFactorValueLoss/>

### PPO epochs

<TSppoEpochs/>

### PPO Batch Size

<TSppoBatchSize/>

### PPO generate temperature

<TSppoGenerateTemp/>

### Offload reward model

<TSoffloadRewardModel/>

## Augmentation settings

### Token mask probability
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_kl-horizon.mdx

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_kl-target.mdx

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_ppo-epochs.mdx

This file was deleted.

This file was deleted.

3 changes: 0 additions & 3 deletions documentation/docs/tooltips/experiments/_reward-model.mdx

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_use-rlhf.mdx

This file was deleted.

Loading