Multi-GPU error, ggml-cuda.cu:7036: invalid argument

I used `llama-cpp-python` with `Langchain`. I got an error when I tried to run the example code from Langchain doc.
I installed:
`CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_FLAGS='-DGGML_CUDA_FORCE_CUSTOM_MEMORY_POOL'" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir`
and I also upgraded `Langchain` to 0.0.330
Then, I runned the following example code from Langchain doc:

```
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 32  # Change this value based on your model and your GPU VRAM pool.
n_batch = 4  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/home/xxxx/llama-2-7b-chat.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"
llm_chain.run(question)
```


here are the output:

> ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
> ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
> ggml_init_cublas: found 2 CUDA devices:
>   **Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6**
>   **Device 1: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1**
> 
> ...
> 
> llm_load_tensors: ggml ctx size = 0.11 MB
> llm_load_tensors: using CUDA for GPU acceleration
> ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3060) as main device
> llm_load_tensors: mem required = 172.97 MB
> llm_load_tensors: offloading 32 repeating layers to GPU
> llm_load_tensors: offloaded 32/35 layers to GPU
> llm_load_tensors: VRAM used: 3718.38 MB
> ..................................................................................................
> llama_new_context_with_model: n_ctx = 512
> llama_new_context_with_model: freq_base = 10000.0
> llama_new_context_with_model: freq_scale = 1
> llama_new_context_with_model: kv self size = 256.00 MB
> llama_build_graph: non-view tensors processed: 740/740
> llama_new_context_with_model: compute buffer total size = 7.18 MB
> llama_new_context_with_model: VRAM scratch buffer: 0.55 MB
> llama_new_context_with_model: total VRAM used: 3718.93 MB (model: 3718.38 MB, context: 0.55 MB)
> 
> **CUDA error 1 at /tmp/pip-install-2o911nrr/llama-cpp-python_7b2f2508c89b451280d9116461f3c9cf/vendor/llama.cpp/ggml-cuda.cu:7036: invalid argument**
> current device: 1


I have two different cards that work well with the compiled `llama.cpp`. But I encountered an error while using `llama-cpp-python`. :(
The same issue has been resolved in [llama.cpp](https://github.com/ggerganov/llama.cpp/issues/3930), but don't know if llama.cpp propagates to llama-cpp-python in time.

* Physical (or virtual) hardware you are using, e.g. for Linux:

> Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
> Device 0: NVIDIA GeForce RTX 3060
> Device 1: NVIDIA GeForce GTX 1080 Ti


* Operating System, e.g. for Linux:

> Linux localhost.localdomain 3.10.0-1160.90.1.el7.x86_64


* SDK version, e.g. for Linux:

> Python 3.9.16
> GNU Make 4.2.1
> g++ (GCC) 11.2.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-GPU error, ggml-cuda.cu:7036: invalid argument #886

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-GPU error, ggml-cuda.cu:7036: invalid argument #886

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions