Skip to content

Commit 76545ab

Browse files
authored
Merge pull request NousResearch#657 from NousResearch/feat/browser-screenshot-sharing
feat: browser screenshot sharing via MEDIA: on all messaging platforms
2 parents dfd37a4 + b8c3bc7 commit 76545ab

10 files changed

Lines changed: 585 additions & 93 deletions

File tree

AGENTS.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -679,6 +679,28 @@ Key files:
679679

680680
---
681681

682+
## Known Pitfalls
683+
684+
### DO NOT use `simple_term_menu` for interactive menus
685+
686+
`simple_term_menu` has rendering bugs in tmux, iTerm2, and other non-standard terminals. When the user scrolls with arrow keys, previously highlighted items "ghost" — duplicating upward and corrupting the display. This happens because the library uses ANSI cursor-up codes to redraw in place, and tmux/iTerm miscalculate positions when the menu is near the bottom of the viewport.
687+
688+
**Rule:** All interactive menus in `hermes_cli/` must use `curses` (Python stdlib) instead. See `tools_config.py` for the pattern — both `_prompt_choice()` (single-select) and `_prompt_toolset_checklist()` (multi-select with space toggle) use `curses.wrapper()`. The numbered-input fallback handles Windows where curses isn't available.
689+
690+
### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
691+
692+
The ANSI escape `\033[K` leaks as literal `?[K` text when `prompt_toolkit`'s `patch_stdout` is active. Use space-padding instead to clear lines: `f"\r{line}{' ' * pad}"`. See `agent/display.py` `KawaiiSpinner`.
693+
694+
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
695+
696+
The `execute_code` sandbox uses `_last_resolved_tool_names` (set by `get_tool_definitions()`) to decide which tool stubs to generate. When subagents run with restricted toolsets, they overwrite this global. After delegation returns to the parent, `execute_code` may see the child's restricted list instead of the parent's full list. This is a known bug — `execute_code` calls after delegation may fail with `ImportError: cannot import name 'patch' from 'hermes_tools'`.
697+
698+
### Tests must not write to `~/.hermes/`
699+
700+
The `autouse` fixture `_isolate_hermes_home` in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Every test runs in isolation. If you add a test that creates `AIAgent` instances or writes session logs, the fixture handles cleanup automatically. Never hardcode `~/.hermes/` paths in tests.
701+
702+
---
703+
682704
## Testing Changes
683705

684706
After making changes:

agent/prompt_builder.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,12 +103,24 @@ def _scan_context_content(content: str, filename: str) -> str:
103103
"You are on a text messaging communication platform, Telegram. "
104104
"Please do not use markdown as it does not render. "
105105
"You can send media files natively: to deliver a file to the user, "
106-
"include MEDIA:/absolute/path/to/file in your response. Audio "
107-
"(.ogg) sends as voice bubbles. You can also include image URLs "
108-
"in markdown format ![alt](url) and they will be sent as native photos."
106+
"include MEDIA:/absolute/path/to/file in your response. Images "
107+
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
108+
"bubbles, and videos (.mp4) play inline. You can also include image "
109+
"URLs in markdown format ![alt](url) and they will be sent as native photos."
109110
),
110111
"discord": (
111-
"You are in a Discord server or group chat communicating with your user."
112+
"You are in a Discord server or group chat communicating with your user. "
113+
"You can send media files natively: include MEDIA:/absolute/path/to/file "
114+
"in your response. Images (.png, .jpg, .webp) are sent as photo "
115+
"attachments, audio as file attachments. You can also include image URLs "
116+
"in markdown format ![alt](url) and they will be sent as attachments."
117+
),
118+
"slack": (
119+
"You are in a Slack workspace communicating with your user. "
120+
"You can send media files natively: include MEDIA:/absolute/path/to/file "
121+
"in your response. Images (.png, .jpg, .webp) are uploaded as photo "
122+
"attachments, audio as file attachments. You can also include image URLs "
123+
"in markdown format ![alt](url) and they will be uploaded as attachments."
112124
),
113125
"cli": (
114126
"You are a CLI AI Agent. Try not to use markdown but simple text "

docs/send_file_integration_map.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,9 @@
115115
- `edit_message(chat_id, message_id, content)` — edit sent messages
116116

117117
### What's missing:
118-
- **Telegram:** No override for `send_document` or `send_image_file` — falls back to text!
119-
- **Discord:** No override for `send_document` — falls back to text!
118+
- **Telegram:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
119+
- **Discord:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
120+
- **Slack:** No override for `send_document` — falls back to text! (`send_image_file` ✅ added)
120121
- **WhatsApp:** Has `send_document` and `send_image_file` via bridge — COMPLETE.
121122
- The base class defaults just send "📎 File: /path" as text — useless for actual file delivery.
122123

@@ -126,13 +127,13 @@
126127
- `send()` — MarkdownV2 text with fallback to plain
127128
- `send_voice()``.ogg`/`.opus` as `send_voice()`, others as `send_audio()`
128129
- `send_image()` — URL-based via `send_photo()`
130+
- `send_image_file()` — local file via `send_photo(photo=open(path, 'rb'))`
129131
- `send_animation()` — GIF via `send_animation()`
130132
- `send_typing()` — "typing" chat action
131133
- `edit_message()` — edit text messages
132134

133135
### MISSING:
134136
- **`send_document()` NOT overridden** — Need to add `self._bot.send_document(chat_id, document=open(file_path, 'rb'), ...)`
135-
- **`send_image_file()` NOT overridden** — Need to add `self._bot.send_photo(chat_id, photo=open(path, 'rb'), ...)`
136137
- **`send_video()` NOT overridden** — Need to add `self._bot.send_video(...)`
137138

138139
## 8. gateway/platforms/discord.py — Send Method Analysis
@@ -141,12 +142,12 @@
141142
- `send()` — text messages with chunking
142143
- `send_voice()` — discord.File attachment
143144
- `send_image()` — downloads URL, creates discord.File attachment
145+
- `send_image_file()` — local file via discord.File attachment ✅
144146
- `send_typing()` — channel.typing()
145147
- `edit_message()` — edit text messages
146148

147149
### MISSING:
148150
- **`send_document()` NOT overridden** — Need to add discord.File attachment
149-
- **`send_image_file()` NOT overridden** — Need to add discord.File from local path
150151
- **`send_video()` NOT overridden** — Need to add discord.File attachment
151152

152153
## 9. gateway/run.py — User File Attachment Handling

gateway/platforms/discord.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,43 @@ async def send_voice(
267267
print(f"[{self.name}] Failed to send audio: {e}")
268268
return await super().send_voice(chat_id, audio_path, caption, reply_to)
269269

270+
async def send_image_file(
271+
self,
272+
chat_id: str,
273+
image_path: str,
274+
caption: Optional[str] = None,
275+
reply_to: Optional[str] = None,
276+
) -> SendResult:
277+
"""Send a local image file natively as a Discord file attachment."""
278+
if not self._client:
279+
return SendResult(success=False, error="Not connected")
280+
281+
try:
282+
import io
283+
284+
channel = self._client.get_channel(int(chat_id))
285+
if not channel:
286+
channel = await self._client.fetch_channel(int(chat_id))
287+
if not channel:
288+
return SendResult(success=False, error=f"Channel {chat_id} not found")
289+
290+
if not os.path.exists(image_path):
291+
return SendResult(success=False, error=f"Image file not found: {image_path}")
292+
293+
filename = os.path.basename(image_path)
294+
295+
with open(image_path, "rb") as f:
296+
file = discord.File(io.BytesIO(f.read()), filename=filename)
297+
msg = await channel.send(
298+
content=caption if caption else None,
299+
file=file,
300+
)
301+
return SendResult(success=True, message_id=str(msg.id))
302+
303+
except Exception as e:
304+
print(f"[{self.name}] Failed to send local image: {e}")
305+
return await super().send_image_file(chat_id, image_path, caption, reply_to)
306+
270307
async def send_image(
271308
self,
272309
chat_id: str,

gateway/platforms/slack.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,35 @@ async def send_typing(self, chat_id: str) -> None:
179179
"""Slack doesn't have a direct typing indicator API for bots."""
180180
pass
181181

182+
async def send_image_file(
183+
self,
184+
chat_id: str,
185+
image_path: str,
186+
caption: Optional[str] = None,
187+
reply_to: Optional[str] = None,
188+
) -> SendResult:
189+
"""Send a local image file to Slack by uploading it."""
190+
if not self._app:
191+
return SendResult(success=False, error="Not connected")
192+
193+
try:
194+
import os
195+
if not os.path.exists(image_path):
196+
return SendResult(success=False, error=f"Image file not found: {image_path}")
197+
198+
result = await self._app.client.files_upload_v2(
199+
channel=chat_id,
200+
file=image_path,
201+
filename=os.path.basename(image_path),
202+
initial_comment=caption or "",
203+
thread_ts=reply_to,
204+
)
205+
return SendResult(success=True, raw_response=result)
206+
207+
except Exception as e:
208+
print(f"[{self.name}] Failed to send local image: {e}")
209+
return await super().send_image_file(chat_id, image_path, caption, reply_to)
210+
182211
async def send_image(
183212
self,
184213
chat_id: str,

gateway/platforms/telegram.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,34 @@ async def send_voice(
306306
print(f"[{self.name}] Failed to send voice/audio: {e}")
307307
return await super().send_voice(chat_id, audio_path, caption, reply_to)
308308

309+
async def send_image_file(
310+
self,
311+
chat_id: str,
312+
image_path: str,
313+
caption: Optional[str] = None,
314+
reply_to: Optional[str] = None,
315+
) -> SendResult:
316+
"""Send a local image file natively as a Telegram photo."""
317+
if not self._bot:
318+
return SendResult(success=False, error="Not connected")
319+
320+
try:
321+
import os
322+
if not os.path.exists(image_path):
323+
return SendResult(success=False, error=f"Image file not found: {image_path}")
324+
325+
with open(image_path, "rb") as image_file:
326+
msg = await self._bot.send_photo(
327+
chat_id=int(chat_id),
328+
photo=image_file,
329+
caption=caption[:1024] if caption else None,
330+
reply_to_message_id=int(reply_to) if reply_to else None,
331+
)
332+
return SendResult(success=True, message_id=str(msg.message_id))
333+
except Exception as e:
334+
print(f"[{self.name}] Failed to send local image: {e}")
335+
return await super().send_image_file(chat_id, image_path, caption, reply_to)
336+
309337
async def send_image(
310338
self,
311339
chat_id: str,

hermes_cli/tools_config.py

Lines changed: 76 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -358,46 +358,88 @@ def _toolset_has_keys(ts_key: str) -> bool:
358358
# ─── Menu Helpers ─────────────────────────────────────────────────────────────
359359

360360
def _prompt_choice(question: str, choices: list, default: int = 0) -> int:
361-
"""Single-select menu (arrow keys)."""
362-
print(color(question, Colors.YELLOW))
361+
"""Single-select menu (arrow keys). Uses curses to avoid simple_term_menu
362+
rendering bugs in tmux, iTerm, and other non-standard terminals."""
363363

364+
# Curses-based single-select — works in tmux, iTerm, and standard terminals
364365
try:
365-
from simple_term_menu import TerminalMenu
366-
menu = TerminalMenu(
367-
[f" {c}" for c in choices],
368-
cursor_index=default,
369-
menu_cursor="→ ",
370-
menu_cursor_style=("fg_green", "bold"),
371-
menu_highlight_style=("fg_green",),
372-
cycle_cursor=True,
373-
clear_screen=False,
374-
)
375-
idx = menu.show()
376-
if idx is None:
377-
return default
378-
print()
379-
return idx
380-
except (ImportError, NotImplementedError):
381-
for i, c in enumerate(choices):
382-
marker = "●" if i == default else "○"
383-
style = Colors.GREEN if i == default else ""
384-
print(color(f" {marker} {c}", style) if style else f" {marker} {c}")
385-
while True:
386-
try:
387-
val = input(color(f" Select [1-{len(choices)}] ({default + 1}): ", Colors.DIM))
388-
if not val:
389-
return default
390-
idx = int(val) - 1
391-
if 0 <= idx < len(choices):
392-
return idx
393-
except (ValueError, KeyboardInterrupt, EOFError):
394-
print()
366+
import curses
367+
result_holder = [default]
368+
369+
def _curses_menu(stdscr):
370+
curses.curs_set(0)
371+
if curses.has_colors():
372+
curses.start_color()
373+
curses.use_default_colors()
374+
curses.init_pair(1, curses.COLOR_GREEN, -1)
375+
curses.init_pair(2, curses.COLOR_YELLOW, -1)
376+
cursor = default
377+
378+
while True:
379+
stdscr.clear()
380+
max_y, max_x = stdscr.getmaxyx()
381+
try:
382+
stdscr.addnstr(0, 0, question, max_x - 1,
383+
curses.A_BOLD | (curses.color_pair(2) if curses.has_colors() else 0))
384+
except curses.error:
385+
pass
386+
387+
for i, c in enumerate(choices):
388+
y = i + 2
389+
if y >= max_y - 1:
390+
break
391+
arrow = "→" if i == cursor else " "
392+
line = f" {arrow} {c}"
393+
attr = curses.A_NORMAL
394+
if i == cursor:
395+
attr = curses.A_BOLD
396+
if curses.has_colors():
397+
attr |= curses.color_pair(1)
398+
try:
399+
stdscr.addnstr(y, 0, line, max_x - 1, attr)
400+
except curses.error:
401+
pass
402+
403+
stdscr.refresh()
404+
key = stdscr.getch()
405+
406+
if key in (curses.KEY_UP, ord('k')):
407+
cursor = (cursor - 1) % len(choices)
408+
elif key in (curses.KEY_DOWN, ord('j')):
409+
cursor = (cursor + 1) % len(choices)
410+
elif key in (curses.KEY_ENTER, 10, 13):
411+
result_holder[0] = cursor
412+
return
413+
elif key in (27, ord('q')):
414+
return
415+
416+
curses.wrapper(_curses_menu)
417+
return result_holder[0]
418+
419+
except Exception:
420+
pass
421+
422+
# Fallback: numbered input (Windows without curses, etc.)
423+
print(color(question, Colors.YELLOW))
424+
for i, c in enumerate(choices):
425+
marker = "●" if i == default else "○"
426+
style = Colors.GREEN if i == default else ""
427+
print(color(f" {marker} {i+1}. {c}", style) if style else f" {marker} {i+1}. {c}")
428+
while True:
429+
try:
430+
val = input(color(f" Select [1-{len(choices)}] ({default + 1}): ", Colors.DIM))
431+
if not val:
395432
return default
433+
idx = int(val) - 1
434+
if 0 <= idx < len(choices):
435+
return idx
436+
except (ValueError, KeyboardInterrupt, EOFError):
437+
print()
438+
return default
396439

397440

398441
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
399442
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
400-
import platform as _platform
401443

402444
labels = []
403445
for ts_key, ts_label, ts_desc in CONFIGURABLE_TOOLSETS:
@@ -411,48 +453,8 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
411453
if ts_key in enabled
412454
]
413455

414-
# simple_term_menu multi-select has rendering bugs on macOS terminals,
415-
# so we use a curses-based fallback there.
416-
use_term_menu = _platform.system() != "Darwin"
417-
418-
if use_term_menu:
419-
try:
420-
from simple_term_menu import TerminalMenu
421-
422-
print(color(f"Tools for {platform_label}", Colors.YELLOW))
423-
print(color(" SPACE to toggle, ENTER to confirm.", Colors.DIM))
424-
print()
425-
426-
menu_items = [f" {label}" for label in labels]
427-
menu = TerminalMenu(
428-
menu_items,
429-
multi_select=True,
430-
show_multi_select_hint=False,
431-
multi_select_cursor="[✓] ",
432-
multi_select_select_on_accept=False,
433-
multi_select_empty_ok=True,
434-
preselected_entries=pre_selected_indices if pre_selected_indices else None,
435-
menu_cursor="→ ",
436-
menu_cursor_style=("fg_green", "bold"),
437-
menu_highlight_style=("fg_green",),
438-
cycle_cursor=True,
439-
clear_screen=False,
440-
clear_menu_on_exit=False,
441-
)
442-
443-
menu.show()
444-
445-
if menu.chosen_menu_entries is None:
446-
return enabled
447-
448-
selected_indices = list(menu.chosen_menu_indices or [])
449-
return {CONFIGURABLE_TOOLSETS[i][0] for i in selected_indices}
450-
451-
except (ImportError, NotImplementedError):
452-
pass # fall through to curses/numbered fallback
453-
454456
# Curses-based multi-select — arrow keys + space to toggle + enter to confirm.
455-
# Used on macOS (where simple_term_menu ghosts) and as a fallback.
457+
# simple_term_menu has rendering bugs in tmux, iTerm, and other terminals.
456458
try:
457459
import curses
458460
selected = set(pre_selected_indices)

0 commit comments

Comments
 (0)