Skip to content

Multi-GPU error, ggml-cuda.cu:7036: invalid argument #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidleo1984 opened this issue Nov 7, 2023 · 2 comments
Open

Multi-GPU error, ggml-cuda.cu:7036: invalid argument #886

davidleo1984 opened this issue Nov 7, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@davidleo1984
Copy link

I used llama-cpp-python with Langchain. I got an error when I tried to run the example code from Langchain doc.
I installed:
CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_FLAGS='-DGGML_CUDA_FORCE_CUSTOM_MEMORY_POOL'" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
and I also upgraded Langchain to 0.0.330
Then, I runned the following example code from Langchain doc:

from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 32  # Change this value based on your model and your GPU VRAM pool.
n_batch = 4  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/home/xxxx/llama-2-7b-chat.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"
llm_chain.run(question)

here are the output:

ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6
Device 1: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1

...

llm_load_tensors: ggml ctx size = 0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3060) as main device
llm_load_tensors: mem required = 172.97 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloaded 32/35 layers to GPU
llm_load_tensors: VRAM used: 3718.38 MB
..................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 256.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 7.18 MB
llama_new_context_with_model: VRAM scratch buffer: 0.55 MB
llama_new_context_with_model: total VRAM used: 3718.93 MB (model: 3718.38 MB, context: 0.55 MB)

CUDA error 1 at /tmp/pip-install-2o911nrr/llama-cpp-python_7b2f2508c89b451280d9116461f3c9cf/vendor/llama.cpp/ggml-cuda.cu:7036: invalid argument
current device: 1

I have two different cards that work well with the compiled llama.cpp. But I encountered an error while using llama-cpp-python. :(
The same issue has been resolved in llama.cpp, but don't know if llama.cpp propagates to llama-cpp-python in time.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Device 0: NVIDIA GeForce RTX 3060
Device 1: NVIDIA GeForce GTX 1080 Ti

  • Operating System, e.g. for Linux:

Linux localhost.localdomain 3.10.0-1160.90.1.el7.x86_64

  • SDK version, e.g. for Linux:

Python 3.9.16
GNU Make 4.2.1
g++ (GCC) 11.2.0

@abetlen abetlen added the bug Something isn't working label Nov 8, 2023
@zhuofan-16
Copy link

Facing the same issue, need to export CUDA_VISIBLE_DEVICE=0 to use only single gpu.

@pseudotensor
Copy link

Still same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants