Skip to content

Commit f2cb873

Browse files
authored
feat: save and restore a context sequence state (#460)
* feat: save and restore a context sequence state * feat: stream function call parameters * feat: configure Hugging Face remote endpoint for resolving URIs * feat: Qwen 3 support * feat(`QwenChatWrapper`): support discouraging the generation of thoughts * feat(`getLlama`): `dryRun` option * feat: `getLlamaGpuTypes` function * fix: adapt to breaking `llama.cpp` changes * fix: capture multi-token segment separators * fix: race condition when reading extremely long gguf metadata * fix: adapt memory estimation to new added model architectures * fix: skip binary testing on certain problematic conditions * fix: improve GPU backend loading error description * fix: update gguf types * fix: performance improvements * docs: update the awesome list * docs: solutions to more CUDA issues
1 parent c070e81 commit f2cb873

File tree

60 files changed

+3520
-1237
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+3520
-1237
lines changed

.github/ISSUE_TEMPLATE/bug-report.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ description: Report a reproducible bug
33
labels:
44
- requires triage
55
- bug
6+
title: "bug: "
7+
type: "Bug"
68
body:
79
- type: markdown
810
attributes:

.github/ISSUE_TEMPLATE/documentation-issue.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ description: Documentation is unclear or otherwise insufficient.
33
labels:
44
- requires triage
55
- documentation
6+
title: "docs: "
7+
type: "Documentation"
68
body:
79
- type: markdown
810
attributes:

.github/ISSUE_TEMPLATE/feature-request.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ description: Suggest an new idea for this project
33
labels:
44
- requires triage
55
- new feature
6+
title: "feat: "
7+
type: "Feature"
68
body:
79
- type: markdown
810
attributes:

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ node_modules
1414
/.vitepress/.cache
1515
/test/.models
1616
/test/temp
17+
/test/.temp
1718
/temp
1819
/coverage
20+
/test-runner-profile
1921

2022
/llama/compile_commands.json
2123
/llama/llama.cpp

.vitepress/config.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -470,8 +470,6 @@ export default defineConfig({
470470
}
471471
},
472472
sidebar: {
473-
"/api/": getApiReferenceSidebar(),
474-
475473
"/guide/": [{
476474
text: "Guide",
477475
base: "/guide",
@@ -550,7 +548,9 @@ export default defineConfig({
550548
]
551549
}
552550
]
553-
}]
551+
}],
552+
553+
"/api/": getApiReferenceSidebar()
554554
},
555555
socialLinks: [
556556
{icon: "npm", link: "https://www.npmjs.com/package/node-llama-cpp"},

docs/cli/pull.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ If a file already exists and its size matches the expected size, it will not be
2020

2121
The supported URI schemes are:
2222
- **HTTP:** `https://`, `http://`
23-
- **Hugging Face:** `hf:<user>/<model>:<quant>` (`#<quant>` is optional, [but recommended](../guide/downloading-models.md#hf-scheme-specify-quant))
23+
- **Hugging Face:** `hf:<user>/<model>:<quant>` (`:<quant>` is optional, [but recommended](../guide/downloading-models.md#hf-scheme-specify-quant))
2424
- **Hugging Face:** `hf:<user>/<model>/<file-path>#<branch>` (`#<branch>` is optional)
2525

2626
Learn more about using model URIs in the [Downloading Models guide](../guide/downloading-models.md#model-uris).

docs/guide/CUDA.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,33 @@ set NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_GENERATOR_TOOLSET=%CUDA_PATH%
114114

115115
Then run the build command again to check whether setting the `CMAKE_GENERATOR_TOOLSET` cmake option fixed the issue.
116116

117+
### Fix the `forward compatibility was attempted on non supported HW` Error {#fix-cuda-forward-compatibility}
118+
This error usually happens when the CUDA version you have installed on your machine is older than the CUDA version used in the prebuilt binaries supplied by `node-llama-cpp`.
119+
120+
To resolve this issue, you can either [update your CUDA installation](https://developer.nvidia.com/cuda-downloads) to the latest version (recommended) or [build `node-llama-cpp` on your machine](#building) against the CUDA version you have installed.
121+
122+
### Fix the `Binary GPU type mismatch. Expected: cuda, got: false` Error {#fix-cuda-gpu-type-mismatch}
123+
This error usually happens when you have multiple conflicting CUDA versions installed on your machine.
124+
125+
To fix it, uninstall older CUDA versions and restart your machine (important).
126+
127+
:::: details Check which CUDA libraries are picked up by `node-llama-cpp`'s prebuilt binaries on your machine
128+
129+
Run this command inside of your project:
130+
131+
::: code-group
132+
```shell [Linux]
133+
ldd ./node_modules/@node-llama-cpp/linux-x64-cuda/bins/linux-x64-cuda/libggml-cuda.so
134+
```
135+
136+
```cmd [Windows]
137+
"C:\Program Files\Git\usr\bin\ldd.exe" node_modules\@node-llama-cpp\win-x64-cuda\bins\win-x64-cuda\ggml-cuda.dll
138+
```
139+
:::
140+
141+
::::
142+
143+
117144
## Using `node-llama-cpp` With CUDA
118145
It's recommended to use [`getLlama`](../api/functions/getLlama) without specifying a GPU type,
119146
so it'll detect the available GPU types and use the best one automatically.

docs/guide/awesome.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,32 @@
22
description: Awesome projects that use node-llama-cpp
33
---
44
# Awesome `node-llama-cpp`
5-
Awesome projects that use `node-llama-cpp`.
5+
:sunglasses: Awesome projects that use `node-llama-cpp`.
6+
7+
<script setup lang="ts">
8+
import DataBadge from "../../.vitepress/components/DataBadge/DataBadge.vue";
9+
</script>
610

711
## Open Source
812
* [CatAI](https://github.com/withcatai/catai) - a simplified AI assistant API for Node.js, with REST API support
13+
<br /><DataBadge title="License" content="MIT"/>
14+
15+
* [Manzoni](https://manzoni.app/) ([GitHub](https://github.com/gems-platforms/manzoni-app)) - a text editor running local LLMs
16+
<br /><DataBadge title="License" content="AGPL-3.0"/>
17+
918

1019
## Proprietary
11-
> List your project here!
20+
* [BashBuddy](https://bashbuddy.run) ([GitHub](https://github.com/wosherco/bashbuddy)) - write bash commands with natural language
21+
<br /><DataBadge title="Partially open source" content="Source available" href="https://github.com/wosherco/bashbuddy/blob/main/LICENSE.md"/>
22+
23+
* [nutshell](https://withnutshell.com) - Private AI meeting notes processed completely on your device
24+
1225

1326

1427
<br />
1528

1629
---
1730

31+
> To add a project to this list, [open a PR](https://github.com/withcatai/node-llama-cpp/edit/master/docs/guide/awesome.md).
32+
>
1833
> To have a project listed here, it should clearly state that it uses `node-llama-cpp`.

docs/guide/chat-session.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -446,6 +446,87 @@ console.log("AI: " + a2);
446446
```
447447
:::
448448

449+
:::: details Saving and restoring a context sequence evaluation state {#save-and-restore-with-context-sequence-state}
450+
You can also save and restore the context sequence evaluation state to avoid re-evaluating the chat history
451+
when you load it on a new context sequence.
452+
453+
Please note that context sequence state files can get very large (109MB for only 1K tokens).
454+
Using this feature is only recommended when the chat history is very long and you plan to load it often,
455+
or when the evaluation is too slow due to hardware limitations.
456+
457+
::: warning
458+
When loading a context sequence state from a file,
459+
always ensure that the model used to create the context sequence is exactly the same as the one used to save the state file.
460+
461+
Loading a state file created from a different model can crash the process,
462+
thus you have to pass `{acceptRisk: true}` to the [`loadStateFromFile`](../api/classes/LlamaContextSequence.md#loadstatefromfile) method to use it.
463+
464+
Use with caution.
465+
:::
466+
467+
::: code-group
468+
```typescript [Save chat history and context sequence state]
469+
import {fileURLToPath} from "url";
470+
import path from "path";
471+
import fs from "fs/promises";
472+
import {getLlama, LlamaChatSession} from "node-llama-cpp";
473+
474+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
475+
476+
const llama = await getLlama();
477+
const model = await llama.loadModel({
478+
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
479+
});
480+
const context = await model.createContext();
481+
const contextSequence = context.getSequence();
482+
const session = new LlamaChatSession({contextSequence});
483+
484+
485+
const q1 = "Hi there, how are you?";
486+
console.log("User: " + q1);
487+
488+
const a1 = await session.prompt(q1);
489+
console.log("AI: " + a1);
490+
491+
const chatHistory = session.getChatHistory();// [!code highlight]
492+
await Promise.all([// [!code highlight]
493+
contextSequence.saveStateToFile("state.bin"),// [!code highlight]
494+
fs.writeFile("chatHistory.json", JSON.stringify(chatHistory), "utf8")// [!code highlight]
495+
]);// [!code highlight]
496+
```
497+
:::
498+
499+
::: code-group
500+
```typescript [Restore chat history and context sequence state]
501+
import {fileURLToPath} from "url";
502+
import path from "path";
503+
import fs from "fs/promises";
504+
import {getLlama, LlamaChatSession} from "node-llama-cpp";
505+
506+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
507+
// ---cut---
508+
const llama = await getLlama();
509+
const model = await llama.loadModel({
510+
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
511+
});
512+
const context = await model.createContext();
513+
const contextSequence = context.getSequence();
514+
const session = new LlamaChatSession({contextSequence});
515+
516+
await contextSequence.loadStateFromFile("state.bin", {acceptRisk: true});// [!code highlight]
517+
const chatHistory = JSON.parse(await fs.readFile("chatHistory.json", "utf8"));// [!code highlight]
518+
session.setChatHistory(chatHistory);// [!code highlight]
519+
520+
const q2 = "Summarize what you said";
521+
console.log("User: " + q2);
522+
523+
const a2 = await session.prompt(q2);
524+
console.log("AI: " + a2);
525+
```
526+
:::
527+
528+
::::
529+
449530
## Prompt Without Updating Chat History {#prompt-without-updating-chat-history}
450531
Prompt without saving the prompt to the chat history.
451532

docs/guide/cmakeOptions.data.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,12 @@ function parseCmakeOptions(cmakeListsTxt: string, optionFilter: ((key: string) =
9090
}
9191
} else if (option.defaultValue === "${BUILD_SHARED_LIBS_DEFAULT}")
9292
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF` on MinGW, `ON` otherwise");
93+
else if (option.defaultValue === "${GGML_CUDA_GRAPHS_DEFAULT}")
94+
option.defaultValue = htmlEscapeWithCodeMarkdown("`ON`");
95+
else if (option.defaultValue === "${GGML_NATIVE_DEFAULT}")
96+
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF` when building for a different architecture,\n`ON` otherwise");
97+
else if (option.key === "LLAMA_CURL")
98+
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF`");
9399
else
94100
option.defaultValue = htmlEscapeWithCodeMarkdown(
95101
option.defaultValue != null

0 commit comments

Comments
 (0)