-
Notifications
You must be signed in to change notification settings - Fork 11.7k
server : display token probabilities in the UI #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ah, this is fun. It works as expected on Android too. |
Very nice! But maybe you could round the probabilities to 2 decimals or so? |
Sure, I'll do it later. Thanks! Maybe just change the numbers to percentage. |
I just had the craziest idea. I'm not requesting anything, just wondering about the viability of this idea. Do you thing this could be used to create spelling suggestions while typing ? Like on Android keyboard ? |
If it was percentages, it would be cool. The byte thing should be fixed in the API level, but at the time we added the probabilities, we couldn't figure it out. |
I have read PR #1962, and I'm a bit confused about this, shouldn't we improve it by convert bytes in the UI side? I'm thinking maybe we can do the merge for bytes to get a readable result (also helpful for Chinese or other language users), but I'm not sure if it will have other problems. |
I've confirmed the another bytes pair is not able to decode successfully, so I just hide that. I see that Open AI playground also doing the same thing. |
Not very sure why it happened, maybe the completion_probabilities of some partial responses is not an array, but as I know in the server.cpp, it should have ensured that it is an array. I just removed the array check of completion_probabilities for messages, only check params.n_probs > 0 for that, it should be avoid this problem. |
{ | ||
// Always send partial response | ||
// so we can get the correct partial response of the last to_send in the client | ||
const json data = format_partial_response(llama, to_send, probs_output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also made the last to_send have a partial response, so we can correctly get the probabilities of last message. (the final response included all probabilities).
Thank you for the fixes, my testing shows it's working. Thank you. |
Generally, the partial response is included a single But things like |
Confirmed it was a problem from Log
I'll fix this later. UPDATED: The fix is here but it was problem with sent_token_probs_index, the above log is expected as we need to wait for possible stop words. |
examples/server/server.cpp
Outdated
const std::string to_send = llama.generated_text.substr(pos, stop_pos); | ||
const std::string to_send = stop_pos == std::string::npos | ||
? llama.generated_text.substr(pos, std::string::npos) | ||
: ""; // just don't send anything if we're not done | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this fix, the to_send is a whitespace when it gets stop_pos 1 from L
, then the sent_token_probs_index will be incorrect.
Also, I merged the master branch, so need to use GGUF models for testing here. |
I got the same in master, in this case the model responds content like |
After fixing the newline issue, I think this can be merged. Thank you guys! |
Nice one. What I'd like to have in the future is a notebook mode (so basic completion instead of chat). Do you have any plans for that? I could maybe it hack it together a few weeks from now when I'm a bit less busy. |
I'm also thinking about to have a pure text completion in web UI, but the plans is not very clear. Currently I'm using the vim plugin for that, but the web UI may be could provide more visual capabilities. It's a low priority for me, but interesting. |
* server : add n_probs param in chat UI * server : keep message data array & show in probabilites component * server : add simple popover component * server : fix completion_probabilities undefined if not set n_probs * server : implement Probabilites * server : handle bytes * server : make n_probs max to 10 for easy scroll * server : adjust for dark/light mode * server : Fix regenerated prompt * server : update index.html.hpp * server : convert prob to percentage + show original value as div title * server : fix Probabilites not used if included empty str * server : skip byte pair in display probabilites * server : remove array check of completion_probabilities in messages * skip empty array or byte pair (> 1) in Probabilites * generate index.html.hpp * fix incorrect prob convert if the str is already a known token * use final response to show probabilities on stop * revert unnecessary change * correct probabilites usage * remove unused function * always send partial response for get correct probs of last to_send * fix typo * fix content of format_final_response * refactor probs render & make pColor transparent if not found * send empty string when got stop_pos in partial * avoid unnecessary empty data event & send rest of partial tokens on stop * use <br /> for new line * skip -1 tok in loop to avoid send '' on end * trim last new lines on stop * revert unnecessary change
#2423
This is a simple implementation for probabilities of llama response.
It renders a popover for each token. The popover is based on preact-portal, it's short so I make some modifications and copy that into index.html.
Dark mode:

Light mode:

For bytes, I just add a bottom border line to split them: (https://github.com/ggerganov/llama.cpp/assets/3001525/ad92444e-58cc-445a-b8a9-44704236e285)(Screenshots updated after 04b6f2c)
We can set
More options
->Show Probabilities
to usen_probs
param.