We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 22.04 with OpenMPI installed and working well. git branch is: b1079 Compile with command below:
make CC=mpicc CXX=mpicxx LLAMA_MPI=1
then start with command:
mpirun -hostfile ./hostfile -n 8 /home/ubuntu/llama.cpp/main -m /home/ubuntu/llama.cpp/models/chinese-alpaca-2-7b-q4_0.gguf -n 128 -p "hello. "
but got error and run into failure:
llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: compute buffer total size = 117.41 MB GGML_ASSERT: llama.cpp:2834: n_threads > 0 [vm10-100-1-215:376355] *** Process received signal *** llama_new_context_with_model: compute buffer total size = 117.41 MB llama_new_context_with_model: compute buffer total size = 117.41 MB llama_new_context_with_model: compute buffer total size = 117.41 MB llama_new_context_with_model: compute buffer total size = 117.41 MB GGML_ASSERT: llama.cpp:2834: n_threads > 0 GGML_ASSERT: llama.cpp:2834: n_threads > 0 GGML_ASSERT: llama.cpp:2834: n_threads > 0 GGML_ASSERT: llama.cpp:2834: n_threads > 0 [vm10-100-1-215:376354] *** Process received signal *** [vm10-100-1-215:376358] *** Process received signal *** [vm10-100-1-215:376353] *** Process received signal *** [vm10-100-1-215:376356] *** Process received signal *** [vm10-100-1-215:376355] Signal: Aborted (6) [vm10-100-1-215:376355] Signal code: (-6) [vm10-100-1-215:376354] Signal: Aborted (6) [vm10-100-1-215:376354] Signal code: (-6) [vm10-100-1-215:376358] Signal: Aborted (6) [vm10-100-1-215:376358] Signal code: (-6) [vm10-100-1-215:376353] Signal: Aborted (6) [vm10-100-1-215:376353] Signal code: (-6) [vm10-100-1-215:376356] Signal: Aborted (6) [vm10-100-1-215:376356] Signal code: (-6) [vm10-100-1-215:376355] [ 0] [vm10-100-1-215:376358] [ 0] [vm10-100-1-215:376353] [ 0] [vm10-100-1-215:376354] [ 0] [vm10-100-1-215:376356] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f7b00be2520] [vm10-100-1-215:376355] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6dfb9a2520] [vm10-100-1-215:376353] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fcc2fc59520] [vm10-100-1-215:376354] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f96ef10f520] [vm10-100-1-215:376356] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f765279a520] [vm10-100-1-215:376358] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f7b00c36a7c] [vm10-100-1-215:376355] [ 2] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f76527eea7c] [vm10-100-1-215:376358] [ 2] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fcc2fcada7c] [vm10-100-1-215:376354] [ 2] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f96ef163a7c] [vm10-100-1-215:376356] [ 2] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6dfb9f6a7c] [vm10-100-1-215:376353] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f7b00be2476] [vm10-100-1-215:376355] [ 3] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f96ef10f476] [vm10-100-1-215:376356] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f96ef0f57f3] [vm10-100-1-215:376356] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x5640622f1314] [vm10-100-1-215:376356] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f7b00bc87f3] [vm10-100-1-215:376355] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x55ff14970314] [vm10-100-1-215:376355] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x55ff1497047f] [vm10-100-1-215:376355] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x55ff14970b92] [vm10-100-1-215:376355] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x55ff149a6762] [vm10-100-1-215:376355] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x55ff149257da] [vm10-100-1-215:376355] [ 9] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f765279a476] [vm10-100-1-215:376358] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f76527807f3] [vm10-100-1-215:376358] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x558379897314] [vm10-100-1-215:376358] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x55837989747f] [vm10-100-1-215:376358] [ 6] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fcc2fc59476] [vm10-100-1-215:376354] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fcc2fc3f7f3] [vm10-100-1-215:376354] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x55736efb1314] [vm10-100-1-215:376354] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x55736efb147f] [vm10-100-1-215:376354] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x55736efb1b92] [vm10-100-1-215:376354] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x55736efe7762] [vm10-100-1-215:376354] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6dfb9a2476] [vm10-100-1-215:376353] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6dfb9887f3] [vm10-100-1-215:376353] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x55af7db6d314] [vm10-100-1-215:376353] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x55af7db6d47f] [vm10-100-1-215:376353] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x55af7db6db92] [vm10-100-1-215:376353] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x55af7dba3762] [vm10-100-1-215:376353] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x558379897b92] [vm10-100-1-215:376358] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x5583798cd762] [vm10-100-1-215:376358] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x55837984c7da] [vm10-100-1-215:376358] [ 9] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x55736ef667da] [vm10-100-1-215:376354] [ 9] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x5640622f147f] [vm10-100-1-215:376356] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x5640622f1b92] [vm10-100-1-215:376356] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x564062327762] [vm10-100-1-215:376356] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x5640622a67da] [vm10-100-1-215:376356] [ 9] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x55af7db227da] [vm10-100-1-215:376353] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f6dfb989d90] [vm10-100-1-215:376353] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f6dfb989e40] [vm10-100-1-215:376353] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x55af7db25fc5] [vm10-100-1-215:376353] *** End of error message *** /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f7b00bc9d90] [vm10-100-1-215:376355] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f7b00bc9e40] [vm10-100-1-215:376355] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x55ff14928fc5] [vm10-100-1-215:376355] *** End of error message *** /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f96ef0f6d90] [vm10-100-1-215:376356] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f96ef0f6e40] [vm10-100-1-215:376356] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x5640622a9fc5] [vm10-100-1-215:376356] *** End of error message *** /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f7652781d90] [vm10-100-1-215:376358] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f7652781e40] [vm10-100-1-215:376358] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x55837984ffc5] [vm10-100-1-215:376358] *** End of error message *** /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fcc2fc40d90] [vm10-100-1-215:376354] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fcc2fc40e40] [vm10-100-1-215:376354] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x55736ef69fc5] [vm10-100-1-215:376354] *** End of error message *** llama_new_context_with_model: compute buffer total size = 117.41 MB system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0 .............................................................................................. llama_new_context_with_model: kv self size = 256.00 MB .....................................................................llama_new_context_with_model: compute buffer total size = 117.41 MB .GGML_ASSERT: llama.cpp:2834: n_threads > 0 ...[vm10-100-1-215:376352] *** Process received signal *** .....[vm10-100-1-215:376352] Signal: Aborted (6) [vm10-100-1-215:376352] Signal code: (-6) ......[vm10-100-1-215:376352] [ 0] .../lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7ff0c40be520] [vm10-100-1-215:376352] [ 1] ../lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7ff0c4112a7c] [vm10-100-1-215:376352] [ 2] ../lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7ff0c40be476] [vm10-100-1-215:376352] [ 3] ../lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7ff0c40a47f3] [vm10-100-1-215:376352] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x56224177c314] . [vm10-100-1-215:376352] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x56224177c47f] [vm10-100-1-215:376352] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x56224177cb92] [vm10-100-1-215:376352] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x5622417b2762] [vm10-100-1-215:376352] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x5622417317da] [vm10-100-1-215:376352] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7ff0c40a5d90] [vm10-100-1-215:376352] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7ff0c40a5e40] [vm10-100-1-215:376352] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x562241734fc5] [vm10-100-1-215:376352] *** End of error message *** llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: compute buffer total size = 117.41 MB GGML_ASSERT: llama.cpp:2834: n_threads > 0 [vm10-100-1-215:376361] *** Process received signal *** [vm10-100-1-215:376361] Signal: Aborted (6) [vm10-100-1-215:376361] Signal code: (-6) [vm10-100-1-215:376361] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fabe2dc7520] [vm10-100-1-215:376361] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fabe2e1ba7c] [vm10-100-1-215:376361] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fabe2dc7476] [vm10-100-1-215:376361] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fabe2dad7f3] [vm10-100-1-215:376361] [ 4] /home/ubuntu/llama.cpp/main(+0x5a314)[0x558c5659c314] [vm10-100-1-215:376361] [ 5] /home/ubuntu/llama.cpp/main(+0x5a47f)[0x558c5659c47f] [vm10-100-1-215:376361] [ 6] /home/ubuntu/llama.cpp/main(+0x5ab92)[0x558c5659cb92] [vm10-100-1-215:376361] [ 7] /home/ubuntu/llama.cpp/main(+0x90762)[0x558c565d2762] [vm10-100-1-215:376361] [ 8] /home/ubuntu/llama.cpp/main(+0xf7da)[0x558c565517da] [vm10-100-1-215:376361] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fabe2daed90] [vm10-100-1-215:376361] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fabe2daee40] [vm10-100-1-215:376361] [11] /home/ubuntu/llama.cpp/main(+0x12fc5)[0x558c56554fc5] [vm10-100-1-215:376361] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 2 with PID 0 on node vm10-100-1-215 exited on signal 6 (Aborted). --------------------------------------------------------------------------
The text was updated successfully, but these errors were encountered:
The issue is that MPI calls llama_eval with 0 threads in the workers:
llama_eval
https://github.com/ggerganov/llama.cpp/blob/230d46c723edf5999752e4cb67fd94edb19ef9c7/llama.cpp#L5528-L5538
But an assert was added in llama_eval_internal to prevent this: https://github.com/ggerganov/llama.cpp/blob/230d46c723edf5999752e4cb67fd94edb19ef9c7/llama.cpp#L2848
llama_eval_internal
Ideally, MPI would use the number of threads from the command line. In the meanwhile, I guess that we could remove the assert.
Sorry, something went wrong.
c10704d
llama : fix MPI threads (close ggml-org#2827)
7d6ef40
No branches or pull requests
Ubuntu 22.04 with OpenMPI installed and working well. git branch is: b1079
Compile with command below:
then start with command:
but got error and run into failure:
The text was updated successfully, but these errors were encountered: