Skip to content

Fix pull 1034 Changes #1042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,16 +181,6 @@ This mode generates text based on an input prompt.
python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear"
```

### Browser
This mode allows you to chat with the model using a UI in your browser
Running the command automatically open a tab in your browser.

[skip default]: begin

```
streamlit run torchchat.py -- browser llama3.1
```

[skip default]: end

### Server
Expand Down Expand Up @@ -252,6 +242,19 @@ curl http://127.0.0.1:5000/v1/chat \

</details>

### Browser
This command opens a basic browser interface for local chat by querying a local server.

First, follow the steps in the Server section above to start a local server. Then, in another terminal, launch the interface. Running the following will open a tab in your browser.

[skip default]: begin

```
streamlit run browser/browser.py
```

Use the "Max Response Tokens" slider to limit the maximum number of tokens generated by the model for each response. Click the "Reset Chat" button to remove the message history and start a fresh chat.


## Desktop/Server Execution

Expand Down
70 changes: 48 additions & 22 deletions browser/browser.py
Original file line number Diff line number Diff line change
@@ -1,40 +1,66 @@
import streamlit as st
from openai import OpenAI

st.title("torchchat")

start_state = [
{
"role": "system",
"content": "You're an assistant. Answer questions directly, be brief, and have fun.",
},
{"role": "assistant", "content": "How can I help you?"},
]

with st.sidebar:
openai_api_key = st.text_input(
"OpenAI API Key", key="chatbot_api_key", type="password"
response_max_tokens = st.slider(
"Max Response Tokens", min_value=10, max_value=1000, value=250, step=10
)
"[Get an OpenAI API key](https://platform.openai.com/account/api-keys)"
"[View the source code](https://github.com/streamlit/llm-examples/blob/main/Chatbot.py)"
"[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/streamlit/llm-examples?quickstart=1)"

st.title("💬 Chatbot")
if st.button("Reset Chat", type="primary"):
st.session_state["messages"] = start_state

if "messages" not in st.session_state:
st.session_state["messages"] = [
{
"role": "system",
"content": "You're an assistant. Be brief, no yapping. Use as few words as possible to respond to the users' questions.",
},
{"role": "assistant", "content": "How can I help you?"},
]
st.session_state["messages"] = start_state


for msg in st.session_state.messages:
st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input():
client = OpenAI(
# This is the default and can be omitted
base_url="http://127.0.0.1:5000/v1",
api_key="YOURMOTHER",
api_key="813", # The OpenAI API requires an API key, but since we don't consume it, this can be any non-empty string.
)

st.session_state.messages.append({"role": "user", "content": prompt})
st.chat_message("user").write(prompt)
response = client.chat.completions.create(
model="stories15m", messages=st.session_state.messages, max_tokens=64
)
msg = response.choices[0].message.content
st.session_state.messages.append({"role": "assistant", "content": msg})
st.chat_message("assistant").write(msg)

with st.chat_message("assistant"), st.status(
"Generating... ", expanded=True
) as status:

def get_streamed_completion(completion_generator):
start = time.time()
tokcount = 0
for chunk in completion_generator:
tokcount += 1
yield chunk.choices[0].delta.content

status.update(
label="Done, averaged {:.2f} tokens/second".format(
tokcount / (time.time() - start)
),
state="complete",
)

response = st.write_stream(
get_streamed_completion(
client.chat.completions.create(
model="llama3",
messages=st.session_state.messages,
max_tokens=response_max_tokens,
stream=True,
)
)
)[0]

st.session_state.messages.append({"role": "assistant", "content": response})
3 changes: 0 additions & 3 deletions server.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,6 @@ def chunk_processor(chunked_completion_generator):
next_tok = ""
print(next_tok, end="", flush=True)
yield f"data:{json.dumps(_del_none(asdict(chunk)))}\n\n"
# wasda = json.dumps(asdict(chunk))
# print(wasda)
# yield wasda

resp = Response(
chunk_processor(gen.chunked_completion(req)),
Expand Down
Loading