Feature: Added api for getting/setting the complete state: rng, logits, embedding and kv_cache #1105

xaedes · 2023-04-21T15:27:25Z

I have implemented functions for getting and setting the rest of the model state.
It includes: random number generator state, logits, embedding and kv_cache.

It was necessary to store the logits so that we can eval tokens, save state, restart program, load state and then sample.
With just restoring kv_cache the sampling did not have access to the required logits and indeed segfaulted on the initially empty logits vector.

The logits vector initial capacity was reserved with a wrong value. This resulted in changing capacity after the first evaluation in which the logits vector is actually resized. I fixed this bug because it propagated to the state size, resulting in unnecessarily changes.

The random number generator state is also included to ensure consistent sampling results.
Since the internal state of the rng is more than just the seed, it is serialized using the standard C++ api for this purpose by streaming into a stringbuffer. For simplicity I did not add further logic to parse and compress the serialized rng state.
For completeness I also stored the embedding vector.

Because the whole state is not in one contiguous memory buffer I decided on an output pointer parameter to get the state data.
The user is responsible to allocate the memory where the state is written to. To support this the required number of bytes can be requested.

including rng, logits, embedding and kv_cache

ggerganov

Great job!

ggerganov · 2023-04-22T07:41:45Z

llama.cpp

+
+// Returns the size of the state
+size_t llama_get_state_size(struct llama_context * ctx) {
+    const size_t s_bool = sizeof(int32_t);


s_bool is unused - is this expected?

Oh, I missed that during cleanup of unused stuff.
At one time during implementation I was saving the bool flags from llama_context, but removed it because it didnt make much sense.

xaedes added 3 commits April 21, 2023 16:48

reserve correct size for logits

8ed3c3f

add functions to get and set the whole llama state:

8288b36

including rng, logits, embedding and kv_cache

remove unused variables

9d26580

xaedes mentioned this pull request Apr 21, 2023

Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs #64

Closed

remove trailing whitespace

1c51e1f

xaedes force-pushed the state_persistence branch from cab1fe0 to 1c51e1f Compare April 21, 2023 15:36

fix comment

456aedc

snxraven mentioned this pull request Apr 21, 2023

Implement caching for evaluated prompts abetlen/llama-cpp-python#44

Closed

ggerganov approved these changes Apr 22, 2023

View reviewed changes

ggerganov merged commit b6e7f9b into ggml-org:master Apr 22, 2023

ggerganov reviewed Apr 22, 2023

View reviewed changes

ggerganov mentioned this pull request Apr 22, 2023

New kv_cache API insufficient to restore model state #730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Added api for getting/setting the complete state: rng, logits, embedding and kv_cache #1105

Feature: Added api for getting/setting the complete state: rng, logits, embedding and kv_cache #1105

xaedes commented Apr 21, 2023

ggerganov left a comment

ggerganov Apr 22, 2023

xaedes Apr 22, 2023

Feature: Added api for getting/setting the complete state: rng, logits, embedding and kv_cache #1105

Feature: Added api for getting/setting the complete state: rng, logits, embedding and kv_cache #1105

Conversation

xaedes commented Apr 21, 2023

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov Apr 22, 2023

Choose a reason for hiding this comment

xaedes Apr 22, 2023

Choose a reason for hiding this comment