Skip to content

Bug: cant finetune #7643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cabfile opened this issue May 30, 2024 · 17 comments
Closed

Bug: cant finetune #7643

cabfile opened this issue May 30, 2024 · 17 comments
Labels
bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) stale

Comments

@cabfile
Copy link

cabfile commented May 30, 2024

What happened?

GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02

Name and Version

version: 2965 (03d8900e)
built with MSVC 19.39.33523.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

E:\slm\llama\other>finetune --model-base ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.g
guf --checkpoint-in chk-piss-LATEST.gguf --checkpoint-out chk-piss-ITERATION.ggu
f --lora-out piss-ITERATION.bin --train-data traindata.txt --save-every 10 --thr
eads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
main: seed: 1717079846
main: model base = '..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf'
llama_model_loader: loaded meta data with 21 key-value pairs and 201 tensors fro
m ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not appl
y in this output.
llama_model_loader: - kv   0:                       general.architecture str
          = llama
llama_model_loader: - kv   1:                               general.name str
          = models
llama_model_loader: - kv   2:                       llama.context_length u32
          = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32
          = 2048
llama_model_loader: - kv   4:                          llama.block_count u32
          = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
          = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
          = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32
          = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
          = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
          = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32
          = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32
          = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str
          = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str
,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32
,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32
,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32
          = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32
          = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32
          = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32
          = 2
llama_model_loader: - kv  20:               general.quantization_version u32
          = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_0:  155 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 259.
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name     = models
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MiB
llm_load_tensors:        CPU buffer size =   606.53 MiB
................................................................................
.....
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =    11.00 MiB
llama_new_context_with_model: KV self size  =   11.00 MiB, K (f16):    5.50 MiB,
 V (f16):    5.50 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB
llama_new_context_with_model:        CPU compute buffer size =    66.50 MiB
llama_new_context_with_model: graph nodes  = 710
llama_new_context_with_model: graph splits = 1
main: init model
print_params: n_vocab               : 32000
print_params: n_ctx                 : 64
print_params: n_embd                : 2048
print_params: n_ff                  : 5632
print_params: n_head                : 32
print_params: n_head_kv             : 4
print_params: n_layer               : 22
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 10000.000000
print_params: rope_freq_scale       : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq             : 4
print_lora_params: n_rank_wk             : 4
print_lora_params: n_rank_wv             : 4
print_lora_params: n_rank_wo             : 4
print_lora_params: n_rank_ffn_norm       : 1
print_lora_params: n_rank_ffn_gate       : 4
print_lora_params: n_rank_ffn_down       : 4
print_lora_params: n_rank_ffn_up         : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm           : 1
print_lora_params: n_rank_output         : 4
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: lora_size = 28472224 bytes (27.2 MB)
main: opt_size  = 42223360 bytes (40.3 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 1507336544 bytes (1437.5 MB)
main: evaluation order = RIGHT_TO_LEFT
main: tokenize training data from traindata.txt
main: sample-start:
main: include-sample-start: false
tokenize_file: total number of samples: 1
main: number of training tokens: 12
main: number of unique tokens: 12
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 512240 bytes (0.5 MB)
train_opt_callback: iter=     0 sample=1/1 sched=0.000000 loss=0.000000 |->
train_opt_callback: reshuffle samples. completed epochs: 1
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
@cabfile cabfile added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels May 30, 2024
@adrian-afl
Copy link

adrian-afl commented May 31, 2024

I'm facing very similar problem, here is what i try to do, its almost copy paste from the original readme for finetune:
.\llama-b3058-bin-win-avx2-x64\finetune.exe --model-base .\models\llama3-8b-inst.gguf --checkpoint-in chk-lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.gguf --checkpoint-out chk-lora-open-llama-3b-v2-q8_0-shakespeare-ITERATION.gguf --lora-out lora-open-llama-3b-v2-q8_0-shakespeare-ITERATION.bin --train-data "shake.txt" --save-every 10 --threads 12 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
and the result is:

...
main: number of unique tokens: 3621
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 6157072 bytes (5.9 MB)
train_opt_callback: iter=     0 sample=1/22783 sched=0.000000 loss=0.000000 |->
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12849: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12849: ne2 == ne02

and then the process exits
I also tried with clblast version with the same result

The model im trying to finetune is this: https://huggingface.co/SanctumAI/Meta-Llama-3-8B-Instruct-GGUF version q6_k
Edit: checked q8 version of the same model, the result is the same
this: sample=1/22783 starts at sample=0/22783 then switches to sample=1/22783 and few seconds later it crashes as above

@alyas77
Copy link

alyas77 commented Jun 3, 2024

the same issues with few other models.(phi, mistral)
here is one of the examples. main .. works fine

huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct  --local-dir ~/projects/models/meta-llama/Meta-Llama-3-8B-Instruct

python ./llama.cpp/convert-hf-to-gguf.py \
    --outfile ~/projects/models/meta-llama/Meta-Llama-3-8B-Instruct/Meta-Llama-3-8B-Instruct.gguf \
    ~/projects/models/meta-llama/Meta-Llama-3-8B-Instruct --outtype=q8_0 \

./llama.cpp/main -i -m ~/projects/models/meta-llama/Meta-Llama-3-8B-Instruct/Meta-Llama-3-8B-Instruct.gguf


./llama.cpp/finetune \
--model-base ~/projects/models/meta-llama/Meta-Llama-3-8B-Instruct/Meta-Llama-3-8B-Instruct.gguf \
--checkpoint-in  chk-lora-Meta-Llama-3-8B-Instruct-shakespeare-LATEST.gguf \
--checkpoint-out chk-lora-Meta-Llama-3-8B-Instruct-shakespeare-ITERATION.gguf \
--lora-out lora-Meta-Llama-3-8B-Instruct-shakespeare-ITERATION.bin \
--train-data "shakespeare.txt" \
--save-every 10 \
--threads 40 --adam-iter 30 --batch 4 --ctx 64 \
--use-checkpointing

main: seed: 1717387035
main: model base = '/home/alyas/projects/models/meta-llama/Meta-Llama-3-8B-Instruct/Meta-Llama-3-8B-Instruct.gguf'
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/alyas/projects/models/meta-llama/Meta-Llama-3-8B-Instruct/Meta-Llama-3-8B-Instruct.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q8_0:  226 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.5928 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 7.95 GiB (8.50 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  8137.64 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =    64.00 MiB
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
main: init model
print_params: n_vocab               : 128256
print_params: n_ctx                 : 64
print_params: n_embd                : 4096
print_params: n_ff                  : 14336
print_params: n_head                : 32
print_params: n_head_kv             : 8
print_params: n_layer               : 32
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 500000.000000
print_params: rope_freq_scale       : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq             : 4
print_lora_params: n_rank_wk             : 4
print_lora_params: n_rank_wv             : 4
print_lora_params: n_rank_wo             : 4
print_lora_params: n_rank_ffn_norm       : 1
print_lora_params: n_rank_ffn_gate       : 4
print_lora_params: n_rank_ffn_down       : 4
print_lora_params: n_rank_ffn_up         : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm           : 1
print_lora_params: n_rank_output         : 4
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: lora_size = 94956320 bytes (90.6 MB)
main: opt_size  = 141731824 bytes (135.2 MB)
main: opt iter 0
main: input_size = 131335200 bytes (125.3 MB)
main: compute_size = 6164070752 bytes (5878.5 MB)
main: evaluation order = RIGHT_TO_LEFT
main: tokenize training data from shakespeare.txt
main: sample-start: 
main: include-sample-start: false
tokenize_file: total number of samples: 22783
main: number of training tokens: 22847
main: number of unique tokens: 3621
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 20523648 bytes (19.6 MB)
train_opt_callback: iter=     0 sample=1/22783 sched=0.000000 loss=0.000000 |->
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02
GGML_ASSERT: ggml.c:12849: ne2 == ne02

@ggerganov
Copy link
Member

Find the first commit that stops working

@opensignature
Copy link

opensignature commented Jun 3, 2024

Find the first commit that stops working

git reset --hard HEAD~100
HEAD is now at efc8f767 move ndk code to a new library (#6951)

with this commit work:
...
train_opt_callback: iter= 0 sample=1/28013 sched=0.000000 loss=0.000000 |>
train_opt_callback: iter= 1 sample=5/28013 sched=0.010000 loss=9.638092
...
with HEAD~99 not work

@ggerganov
Copy link
Member

Does it work with -nkvo?

@opensignature
Copy link

Does it work with -nkvo?

I don't think -nkvo parameter is present in finetune. However I recompiled everything, forcing bool no_kv_offload = true; in common.h but it still doesn't work

@opensignature
Copy link

if it helps, from commit c4ec9c0 to commit 3cbd23e the error is different, and it is this: GGML_ASSERT: examples/finetune/finetune.cpp:646: false && "TODO: ggml_flash_attn_ext() not yet supported"

@LucaKoval
Copy link

Any updates on this issue? I'm facing the same problem unfortunately.

@hwiorn
Copy link

hwiorn commented Jun 4, 2024

if it helps, from commit c4ec9c0 to commit 3cbd23e the error is different, and it is this: GGML_ASSERT: examples/finetune/finetune.cpp:646: false && "TODO: ggml_flash_attn_ext() not yet supported"

This is not related. There is the fix for flash_attention flag(ref: 9588f19) for it a few days ago. Now, default is false. You can use --no-flash option.

UPDATED: Actually, flash_attention is related to this issue. See below comments.

@hwiorn
Copy link

hwiorn commented Jun 4, 2024

I have a same issue on Linux. Llama3-finetuned models always get this error, but prediction(main -m <model_gguf>) is okay.

Only fine-tuning open_llama_3b_v2 model works okay.

@ggerganov
Copy link
Member

Find the first commit that causes GGML_ASSERT: ggml.c:12849: ne2 == ne02

@hwiorn
Copy link

hwiorn commented Jun 4, 2024

I'm git-bisecting this. Quite hard to find.

--no-flash option makes the below error too on efc8f76. Without that option, finetuning seems to work.

GGML_ASSERT: llama.cpp/ggml.c:12262: ne2 == ne02

Latest commits(--no-flash is default according to 9588f19) make the below error without --no-flash option. So, d48c88c can cause this issue.

GGML_ASSERT: llama.cpp/examples/finetune/finetune.cpp:646: false && "TODO: ggml_flash_attn_ext() not yet supported"

When I re-build e84b71c and run it, training works. But, I'm not sure if it works properly because FA-related commits were merged frequently.

cd llama.cpp
git checkout d48c88cbd563b6cf0ce972e2f56796896e240736^
rm -rf build
cmake -B build -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
cmake --build build --config Debug -j
build/bin/finetune \
  --model-base $model \
  --train-data shakespeare.txt \
  --lora-out lora.gguf \
  --seed 1

@ggerganov Check out d48c88c

@hwiorn
Copy link

hwiorn commented Jun 4, 2024

With --no-flash on d48c88c,

GGML_ASSERT: llama.cpp/ggml.c:12809: ne2 == ne02
GGML_ASSERT: llama-cpp/llama.cpp/ggml.c:12809: ne2 == ne02
GGML_ASSERT: llama-cpp/llama.cpp/ggml.c:12809: ne2 == ne02
GGML_ASSERT: llama-cpp/llama.cpp/ggml.c:12809: ne2 == ne02
GGML_ASSERT: llama-cpp/llama.cpp/ggml.c:12809: ne2 == ne02
GGML_ASSERT: llama-cpp/llama.cpp/ggml.c:12809: ne2 == ne02

Without --no-flash(default) on d48c88c,

GGML_ASSERT: llama.cpp/examples/finetune/finetune.cpp:646: false && "TODO: ggml_flash_attn_ext() not yet supported"

@hwiorn
Copy link

hwiorn commented Jun 4, 2024

Related to #7523

@opensignature
Copy link

I think we should try with small base models and scale up to those that cause problems.
For example with https://huggingface.co/Maykeye/TinyLLama-v0 finetune works correctly

@github-actions github-actions bot added the stale label Jul 8, 2024
@Spider-netizen
Copy link

Spider-netizen commented Jul 19, 2024

@hwiorn

Did you find anything? The problem is only with llama3 as it seems...
Did it ever used to work with llama3 models?

@github-actions github-actions bot removed the stale label Jul 20, 2024
@github-actions github-actions bot added the stale label Aug 19, 2024
Copy link
Contributor

github-actions bot commented Sep 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) stale
Projects
None yet
Development

No branches or pull requests

8 participants