StackLLaMA: fix supervised finetuning and reward model training #399

mnoukhov · 2023-06-01T19:19:39Z

for supervised finetuning

removed tokenizer hacks since they are no longer necessary with the updated llamatokenizer
black + isort

for reward modeling

added tokenizer_name so it can be separately specified from model
again removed old llama tokenizer hacks
added eval_first_step option to add an eval loop after the first step to make nicer graphs
black + isort

tokenizer can be separately specified from model removed old llama tokenizer hacks evaluate after first step option to make nicer graphs black + isort

HuggingFaceDocBuilderDev · 2023-06-02T12:31:35Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

Thanks for your hardwork @mnoukhov !
I have the same comments as here: #398 (review)
If you run the styling checks we should be all good!

mnoukhov · 2023-06-04T03:21:56Z

Ran the checks and added the configs to my workspace config for the future :)

younesbelkada

Thanks a lot!

lvwerra

Thanks a lot for updating!

…ingface#399) * better reward modelling tokenizer can be separately specified from model removed old llama tokenizer hacks evaluate after first step option to make nicer graphs black + isort * removed tokenizer hacks from supervised ft * black and flake8

mnoukhov added 2 commits June 1, 2023 14:49

better reward modelling

eece2d1

tokenizer can be separately specified from model removed old llama tokenizer hacks evaluate after first step option to make nicer graphs black + isort

removed tokenizer hacks from supervised ft

3cddd82

mnoukhov mentioned this pull request Jun 1, 2023

Reproducing StackLLaMA #401

Closed

younesbelkada reviewed Jun 2, 2023

View reviewed changes

black and flake8

d66c399

younesbelkada approved these changes Jun 5, 2023

View reviewed changes

younesbelkada requested a review from lvwerra June 5, 2023 08:26

lvwerra approved these changes Jun 6, 2023

View reviewed changes

younesbelkada merged commit ef57cdd into huggingface:main Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StackLLaMA: fix supervised finetuning and reward model training #399

StackLLaMA: fix supervised finetuning and reward model training #399

Uh oh!

mnoukhov commented Jun 1, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 2, 2023 •

edited

Loading

Uh oh!

younesbelkada left a comment

Uh oh!

mnoukhov commented Jun 4, 2023

Uh oh!

younesbelkada left a comment

Uh oh!

lvwerra left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

StackLLaMA: fix supervised finetuning and reward model training #399

StackLLaMA: fix supervised finetuning and reward model training #399

Uh oh!

Conversation

mnoukhov commented Jun 1, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

mnoukhov commented Jun 4, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Jun 2, 2023 •

edited

Loading