A modern, modular shell assistant powered by LLMs with first-class Model Context Protocol (MCP) support.
Meet your shell copilot — the AI bestie for anyone who lives in the terminal.
Features:
- Async-first core
- Configurable providers via LiteLLM (OpenAI, Claude, Perplexity, Azure, etc.)
- MCP tools discovery and invocation with resilient lifecycle
- Interactive REPL with persistent session, history, and multi-line input support
- Clean CLI UX with progress spinners and Markdown rendering
- Multimodal inputs (images, audio, PDFs) when supported by the model
- Dual multi-line modes: auto-continuation (default) or explicit Ctrl+S submission
- Predictable I/O for maximum flexibility
There's already many CLI tools for interaction with LLMs. Some of them are designed for coding (eg. Aider, Opencode, Codex), some others are meant for sysadmin or generic use (eg. shell-gpt). Having a tool is no longer an issue, with LLMs almost anyone can vibe-code anything they like, even people without prior experience. We can argue about quality and security of resulting products but the fact is that over time, as LLMs are getting rapidly better as well as people are finding new approaches, it will become irrelevant.
As this world evolves quickly, it is clear that it is not about tools; it is about human creativity, ideas, and building a modular architecture using blocks that can be replaced at any time.
gptsh aims to be a versatile, simple, and extensible tool built around the idea of agents, where an agent is an LLM with a role-specific prompt that defines its behavior and an assigned set of tools (using MCP).
It is meant to be simple—mostly plug-and-play—with examples and proven setups and usage patterns shared by others.
You can easily use it with a single/default agent and a Claude-like
mcp_servers.json as-is.
Or you can define multiple agents with different roles and tools and use them as needed.
Or you can set up a more complex environment with multiple subagents (e.g., Software Developer, QA Engineer) and one agent (Manager) that receives the user prompt, orchestrates work, and delegates tasks to these agents. Even an agent can be invoked as a tool from another agent.
flowchart TD
%% User interaction
U[User Prompt] --> M[Manager Agent]
%% Manager orchestrates tasks
M -->|Delegates task| Dev[Software Developer Agent]
M -->|Delegates task| QA[QA Engineer Agent]
%% Agents can call other agents as tools
Dev -.->|Calls as tool| QA
QA -.->|Calls as tool| Dev
M -.->|Calls as tool| Dev
M -.->|Calls as tool| QA
%% Agents use LLM via LiteLLM
subgraph LLM_Stack[LLM Access]
direction TB
Lite[LiteLLM Provider Router]
Model[(LLM Model)]
Lite --> Model
end
Dev -->|Chat / Tool selection| Lite
QA -->|Chat / Tool selection| Lite
M -->|Coordination / Planning| Lite
%% MCP Servers and Tools
subgraph MCP[MCP Tooling Layer]
direction TB
subgraph Builtins[Builtin Tools]
T1[[time]]
T2[[shell]]
T3[[clipboard]]
end
subgraph Ext[External MCP Servers]
FS[[filesystem]]
GIT[[git]]
WEB[[tavily/web]]
OTHER[[...]]
end
end
Dev -->|Invoke tool| MCP
QA -->|Invoke tool| MCP
M -->|Invoke tool| MCP
%% Results aggregation
Dev -->|Work output| M
QA -->|Test results / feedback| M
M -->|Aggregated answer| U
We use uv/uvx for environment management and running:
Best way to install gptsh is using uv tool
For the latest stable release:
uv tool install gptsh-cliTo install the latest unreleased (main branch) version:
uv tool install git+https://github.com/fpytloun/gptsh.git@mainThis will put executables into ~/.local/bin so make sure it is in your $PATH
export PATH="${PATH}:~/.local/bin"Warning: macOS + Python 3.13 build issue with pasteboard==0.4.0
Installing gptsh-cli can fail on macOS with Python 3.13 due to an upstream C extension compatibility issue in pasteboard==0.4.0:
- Symptom: build fails with clang errors like:
- cast-function-type-mismatch
- errors in pasteboard.m when building pasteboard._native
- Cause: older pasteboard uses a function signature incompatible with Python 3.13; warnings are treated as errors.
Workaround (temporary): bypass the error by relaxing the specific warning during build:
# Install gptsh-cli with warning demoted from error
CFLAGS="-Wno-error=cast-function-type-mismatch" uv tool install gptsh-cli@git+https://github.com/fpytloun/gptsh.git@c86f275507b8ae63edf974985b3336facd88a815If you prefer uvx, then use this command:
uvx --from gptsh-cli gptsh --helpYou can also set alias:
alias gptsh="uvx --from gptsh-cli gptsh"Not pinning version will cause that uvx will try to update on each run so it
will increase startup time. You can set version like this (check for latest
release first):
uvx --from gptsh-cli==<version> gptshSingle-shot prompt:
gptsh "Summarize the latest project changes"Pipe input from stdin:
git diff | gptsh "Explain the changes and suggest a commit message"Binary stdin (images, PDFs, audio) is automatically detected and sent to capable models:
# Images with vision models (gpt-4o, claude-3.5-sonnet, etc.)
cat image.png | gptsh "What is in this image?"
# → Sends as multimodal content with image data
# PDFs with PDF-capable models
cat document.pdf | gptsh "Summarize this document"
# → Sends as multimodal content with PDF data
# Audio files with audio-capable models (gpt-4o, gpt-4o-mini)
cat recording.mp3 | gptsh "What did they say?"
# → Optionally transcribes with Whisper API, or sends as multimodal audio
# Other binaries fall back to text markers
cat archive.zip | gptsh "What is this?"
# → Sends: "[Attached: application/zip, 1234 bytes]"Plain text output (default is markdown):
gptsh -o text --no-progress "Generate shell command to rename all files in directory and prefix them with xxx_"Usage: gptsh [OPTIONS] [PROMPT]
gptsh: Modular shell/LLM agent client.
Options:
--provider TEXT Override LiteLLM provider from config
--model TEXT Override LLM model
--agent TEXT Named agent preset from config
--config TEXT Specify alternate config path
--stream / --no-stream
--progress / --no-progress
--debug
-v, --verbose Enable verbose logging (INFO)
--mcp-servers TEXT Override path to MCP servers file
--list-tools
--list-providers List configured providers
--list-agents List configured agents and their tools
--list-sessions List saved sessions (supports filters)
-o, --output [text|markdown] Output format
--no-tools Disable MCP tools (discovery and execution)
--tools TEXT Comma/space-separated MCP server labels to
allow (others skipped)
-i, --interactive Run in interactive REPL mode
--multiline Enable full multi-line mode (Ctrl+S to submit)
-s, --session TEXT Session reference (index or id)
--show-session TEXT Show a saved session by id or index and exit
--print-session Print saved session (requires --session) and continue
--summarize-session TEXT Summarize a saved session and print only the summary
--cleanup-sessions Remove older saved sessions, keeping only the most recent ones
--keep-sessions INTEGER How many most recent sessions to keep with --cleanup-sessions
--delete-session TEXT Delete a saved session by id or index
--copy Auto-copy last assistant message to clipboard on exit
-h, --help Show this message and exit.
- List sessions (filtered; indices preserved):
gptsh --list-sessionsgptsh --list-sessions --agent devgptsh --list-sessions --provider openai --model gpt-5
- Show full session (header + transcript; pager-friendly):
gptsh --show-session 0 | less
- Print then continue:
gptsh --print-session -s 0 -i(REPL)gptsh --print-session -s 0 "Continue here"(one more non-interactive turn)
- Summarize only:
gptsh --summarize-session 0
- Cleanup/delete:
gptsh --cleanup-sessions(keep 10 by default)gptsh --cleanup-sessions --keep-sessions 3gptsh --delete-session 0
- Precedence: CLI
--no-sessions> per-agentagents.<name>.sessions.enabled> globalsessions.enabled> default True - Example:
sessions:
enabled: true
agents:
committer:
model: gpt-5-mini
sessions:
enabled: falseList available tools discovered from configured MCP servers:
gptsh --list-toolsDisable tools entirely:
gptsh --no-tools "Explain which tools are available to you"Allow only specific MCP servers (whitelist):
gptsh --tools serena --list-tools
gptsh --tools serena "Only Serena tools will be available"This flag supports multiple labels with comma/space separation:
uvx gptsh --tools serena,tavilygptsh includes several builtin tools available by default:
time— Access system time and timezone operationsshell— Execute shell commands with history searchclipboard— Read from and write to system clipboard with OSC52 support over SSH
The clipboard tool provides native clipboard access on macOS and Linux with optional OSC52 support for SSH sessions.
Features:
- Native clipboard access (no subprocess overhead)
- macOS: Uses
pasteboardlibrary (Cocoa bindings) - Linux: Uses
tkinter(built-in, no external dependencies) - OSC52 support: Works over SSH to update local clipboard
- Smart auto-detection: Automatically uses optimal method for local vs remote sessions
Usage examples:
# Read clipboard content
gptsh "Analyze the code I copied to my clipboard"
# Write to clipboard
gptsh "Generate a docker command and put it in my clipboard"
# Over SSH - will work via OSC52 and update your local clipboard
ssh user@server
gptsh "Generate a backup command and write to clipboard"
# → Clipboard updated on your LOCAL machine!Configuration:
clipboard:
enabled: true # Enable/disable clipboard tool
mode: "auto" # Options: "auto", "native", "both", "osc52"
# auto (default): Smart detection - uses both in SSH, native locally
# native: Never use OSC52, native clipboard only
# both: Always attempt both methods (redundant but guaranteed)
# osc52: Only OSC52, useful for remote-only environmentsPlatform Support:
| Platform | Read | Write | Method |
|---|---|---|---|
| macOS | ✅ | ✅ | pasteboard (Cocoa) |
| Linux | ✅ | ✅ | tkinter (stdlib) |
| SSH (any) | ✅ OSC52 | ✅ OSC52+native | Over terminal |
Installation for macOS (optional):
On macOS, the clipboard tool tries to use the pasteboard library for better Cocoa integration. It's optional but recommended:
uv tool install --with clipboard-macos gptsh-cliIf not installed, the tool will provide a clear error message on use.
Config is merged from:
- Global: ~/.config/gptsh/config.yml
- Global snippets: ~/.config/gptsh/config.d/*.yml (merged in lexicographic order)
- Project: ./.gptsh/config.yml
Merge semantics:
- The per-project config is merged into the global config (project overrides global where keys overlap).
- MCP servers definitions follow precedence (see below) and are not merged across files; the first matching source wins. Practically, a project-local
./.gptsh/mcp_servers.jsontakes precedence over the user-level~/.config/gptsh/mcp_servers.json.
flowchart TD
B["Load global config (~/.config/gptsh/config.yml)"] --> C["Load global snippets (~/.config/gptsh/config.d/*.yml)"]
C --> D["Load project config (./.gptsh/config.yml)"]
D --> E["Merge configs (project overrides global)"]
E --> F["Expand ${ENV} and process !include"]
F --> G{"Select agent (--agent or default_agent)"}
G --> H["Resolve provider/model (CLI > agent > provider)"]
Environment variables may be referenced using ${VAR_NAME} (and ${env:VAR_NAME} in mcp_servers.json is normalized to ${VAR_NAME}). YAML also supports a custom !include tag resolved relative to the including file, with wildcard support. For example:
- agents: !include agents.yml
- agents: !include agents/*
You can configure MCP servers inline in YAML or via a Claude-compatible JSON file. Only one servers definition is used at a time with this precedence:
- CLI parameter (e.g.,
--mcp-servers mcp_servers.json) - Per-agent inline YAML
agents.<name>.mcp.servers - Global inline YAML
mcp.servers - Servers file (first existing):
./.gptsh/mcp_servers.jsonthen~/.config/gptsh/mcp_servers.json
Inline YAML is equivalent in structure to the JSON file and enables self-contained agents: you can define the required MCP servers directly on an agent and avoid relying on global server files. This lets a project ship agents that "just work" without external setup.
flowchart TD
B{"MCP servers source"} --> B1["CLI --mcp-servers PATH"]
B --> B2["Agent inline mcp.servers"]
B --> B3["Global inline mcp.servers"]
B --> B4["Servers file (./.gptsh/mcp_servers.json or ~/.config/gptsh/mcp_servers.json)"]
B1 --> C["Servers selected"]
B2 --> C
B3 --> C
B4 --> C
C --> D["Add built-ins: time, shell"]
D --> E{"Tools policy"}
E --> E1["Apply agent.tools allow-list"]
E --> E2["Apply CLI --tools override"]
E --> E3["--no-tools disables tools"]
E1 --> F["Discover tools"]
E2 --> F
E3 --> F
Inline YAML mapping (preferred):
mcp:
servers:
tavily:
transport: { type: sse, url: "https://api.tavily.com/mcp" }
credentials:
headers:
Authorization: "Bearer ${TAVILY_API_KEY}"
filesystem:
transport: { type: stdio }
command: uvx
args: ["mcp-filesystem", "--root", "."]
env: {}You can also embed JSON as a string. If the JSON includes a top-level mcpServers, it will be unwrapped automatically:
mcp:
servers: |
{"mcpServers": {"tavily": {"transport": {"type": "sse", "url": "https://api.tavily.com/mcp"}}}}Built-in in-process servers time, shell, and clipboard are always available and are merged into your configuration (inline or file-based). To limit which servers/tools are used at runtime, use the tools allow-list on the agent (e.g., tools: ["git"]). This filters the merged set to only those servers. To completely override or effectively disable built-ins for an agent, set tools to a list without them.
Per-agent tools allow-list:
- Define
agents.<name>.toolsas a list of MCP server labels to expose to that agent (e.g.,tools: ["tavily", "serena"]). - This is a filter over all configured MCP servers (from inline/global or servers file). You can override it at runtime with
--tools.
flowchart TD
A["Config resolved and MCP tools discovered"] --> B["Build approval policy (autoApprove, safety)"]
B --> C["Construct Agent (LLM + tools + policy + prompts)"]
C --> D["CLI one-shot run"]
C --> E["Interactive REPL"]
gptsh/
cli/
entrypoint.py # thin CLI, defers to core
utils.py # CLI helpers: agent resolution, listings
core/
approval.py # DefaultApprovalPolicy
config_api.py # config helpers (now use core.models)
config_resolver.py # build_agent
exceptions.py # ToolApprovalDenied
logging.py # logger setup
models.py # typed config models (moved from domain/)
progress.py # RichProgressReporter
repl.py # interactive REPL (uses runner)
runner.py # unified run_turn (stream + tools + fallback)
session.py # ChatSession (tool loop, params)
stdin_handler.py # safe stdin handling
llm/
litellm_client.py # LiteLLMClient + stream chunk logging
chunk_utils.py # extract_text
tool_adapter.py # tool specs + tool_calls parsing
mcp/
client.py # persistent sessions
manager.py # MCPManager
api.py # facade
tools_resolver.py # ToolHandle resolver
builtin/
time.py, shell.py, clipboard.py # builtin tools
tests/ # pytest suite (unit tests)
default_agent: default
default_provider: openai
providers:
openai:
model: openai/gpt-4.1
agents:
default:
model: gpt-4.1
output: markdown
autoApprove: ["time"]
prompt:
system: "You are a helpful assistant called gptsh."
cli:
output: text
model: "gpt-4.1-mini"
tools: ["shell"]
prompt:
system: |
You are expert system administrator with deep knowledge of Linux and Unix-based systems.
You have in two modes: either you execute command using tool (default if tool is available) or you provide command that user can execute.
**Instructions (tool):**
- If you have shell execution tool available, call that tool to execute command by yourself
- After command is completed, check exit code and return tool output
- Return only tool output as it would return if executed directly
- Do not make up output of tool and pretend you executed something!
**Instructions (without tool):**
- Return command that can be executed as is on given system and does what user wants
- Make sure your output is compatible with POSIX-compliant shell
- Return only ready to be executed command and nothing else!
- It is likely to be passed to sh/bash via stdin
- If command is destructive, make sure to use echo/read for user confirmation unless user commands to skip confirmation
hello:
tools: []
prompt:
user: "Hello, are you here?"Global Prompt Configuration:
You can configure REPL prompt behavior globally:
prompt:
format: "{agent}|{model}> " # Templized prompt (see below for placeholders)
multiline: false # Enable Ctrl+S multi-line mode
hint: true # Show "Press Ctrl+S to submit" on startup (multiline mode)Prompt Template Placeholders:
{agent}- Agent name with cyan bold color{model}- Model name with magenta color{agent_plain}- Agent name without color{model_plain}- Model name without color
Examples:
# Custom separator
prompt:
format: "[{agent_plain}:{model_plain}] "
# Different layout
prompt:
format: "({agent}) {model}→ "
# With multi-line mode
prompt:
format: "{agent}|{model}> "
multiline: true
hint: trueYou can configure instruction files to be automatically loaded into session on startup. This is useful for providing context, guidelines, or documentation that the LLM should always have available.
Instructions are specified as file paths and support ~ expansion for home directory. If instructions file does not exists, it is ignored.
Global Instructions:
# Apply to all agents
instructions:
- AGENTS.mdPer-Agent Instructions:
instructions: [] # Global instructions (can be empty)
agents:
default:
model: gpt-4.1
developer:
model: gpt-5
instructions:
- AGENTS.md # Specific instructions for this agent
- docs/architecture.md
- docs/code-style.md
simple:
instructions: [] # This agent has no instructionsPrecedence:
- Agent-level
instructionsoverrides globalinstructions - Empty list
[]explicitly disables instructions for that agent - Missing files are silently skipped (logged at DEBUG level)
- Files are loaded in order and sent to the LLM on REPL startup
- Instructions are preserved across turns and included in session history
Example Use Cases:
- Load project documentation (README, ARCHITECTURE, API docs)
- Provide coding guidelines and standards
- Include prompt templates or examples
- Set context about the project domain or goals
Agents at a glance:
- An agent bundles LLM + tools + prompt. The prompt includes a system prompt and may also include a pre-defined user prompt so you do not have to pass a prompt for routine tasks (e.g., a
changelogorcommitteragent).
For full example, see examples directory.
{
"mcpServers": {
"sequentialthinking": {
"args": [
"run",
"--rm",
"-i",
"mcp/sequentialthinking"
],
"autoApprove": [
"sequentialthinking"
],
"command": "docker"
},
"filesystem": {
"args": [
"run",
"-i",
"--rm",
"--mount",
"type=bind,src=${HOME},dst=${HOME}",
"mcp/filesystem",
"${HOME}"
],
"autoApprove": [
"directory_tree",
"get_file_info",
"list_allowed_directories",
"list_directory",
"read_file",
"read_multiple_files",
"search_files"
],
"command": "docker"
},
"git": {
"args": [
"mcp-server-git"
],
"autoApprove": [
"git_diff",
"git_diff_staged",
"git_diff_unstaged",
"git_log",
"git_show",
"git_status",
"git_branch"
],
"command": "uvx"
},
"tavily": {
"args": [
"run",
"-i",
"--rm",
"-e",
"TAVILY_API_KEY",
"mcp/tavily"
],
"autoApprove": [
"tavily-search",
"tavily-extract",
"tavily-crawl",
"tavily-map"
],
"command": "docker",
"env": {
"TAVILY_API_KEY": "${TAVILY_API_KEY}"
}
}
}
}- Use ${VAR} for env expansion.
- autoApprove lists tools that should be pre-approved by the UI.
You can override servers files with the CLI:
gptsh --mcp-servers ./.gptsh/mcp_servers.json --list-toolsYou can restrict which servers load by using:
gptsh --tools serena "Only serena’s tools are available"Ask with project context piped in:
rg -n "async def" -S | gptsh "What async entry points exist and what do they do?"Use Text output for plain logs:
gptsh -o text "Return a one-line status summary"Use a different provider/model:
gptsh --provider openai --model gpt-5-mini "Explain MCP in a paragraph"-
Compact current conversation history (preserving system prompt):
- In REPL, run
/compactto summarize with the small model and replace history with a single labeled USER summary message. This reduces context size for subsequent turns.
- In REPL, run
-
Start a REPL:
gptsh -i- Provide an initial prompt and continue in REPL:
gptsh -i "Say hello"- Pipe stdin as the initial prompt and continue in REPL:
echo "Summarize this input" | gptsh -iREPL slash-commands:
- /exit — Exit the REPL
- /quit — Exit the REPL (alias)
- /model — Override the current model
- /agent — Switch to a configured agent
- /reasoning_effort [minimal|low|medium|high] — Set reasoning effort for current agent
- /tools — List discovered MCP tools for current agent
- /no-tools [on|off] — Toggle or set MCP tool usage for this session
- /info — Show session/model info and usage
- /file — Attach a file to the conversation:
- Text files (.txt, .md, .json, .yml, etc.): inlined as plain text
- Images (.png, .jpg, .gif, .webp, .bmp): sent as multimodal content if model supports vision
- PDFs (.pdf): sent as multimodal content if model supports PDF input
- Audio files (.mp3, .wav, .ogg, .flac, .m4a): transcribed or sent as multimodal audio if model supports
- Supports files of any size — even very large images or documents
- /compact — Summarize and compact history (keeps system prompt, inserts labeled USER summary)
- /copy — Copy the last assistant message to clipboard (uses native clipboard or OSC52 over SSH)
- /help — Show available commands (Tab completion works for slash-commands and agent names.)
gptsh supports two multi-line input modes, controlled by the prompt.multiline config option:
Automatically detects continuation lines and prompts for more input. Perfect for natural, intuitive interactions.
Continuation triggers:
- Trailing backslash — Explicitly continue on next line:
> Explain this concept \
...> in simple terms
- Unclosed brackets/parentheses — Automatically detect incomplete grouping:
> Process these items: [
...> "item1",
...> "item2"
...> ]
- Markdown code blocks — Detect triple backticks for code:
> Here's the code:
...> ```python
...> def hello():
...> print("world")
...> ```
Enable true multi-line editing with explicit submission. Press Ctrl+S to submit.
Configuration:
# ~/.config/gptsh/config.yml or ./.gptsh/config.yml
prompt:
multiline: trueUsage:
> Line 1
> Line 2
> Line 3
> [Press Ctrl+S to submit]
Features:
- Enter key inserts newlines (doesn't submit)
- Ctrl+S submits the accumulated input
- Full line editing until submission
- Useful for complex multi-paragraph prompts
CLI Override:
You can enable multi-line mode from the command line without config:
gptsh -i --multiline # Interactive REPL with Ctrl+S mode
gptsh --multiline "Your prompt" # Single-shot with Ctrl+S modeThe --multiline CLI flag overrides the config file setting (which defaults to false).
Default: prompt.multiline: false (auto-continuation mode)
Disable progress:
gptsh --no-progress "Describe current repo structure"Use the /copy command in REPL or the --copy flag in one-shot mode to copy the last assistant message to your clipboard.
REPL example:
gptsh -i
> Generate a docker command for me
[LLM generates docker run command]
> /copy
# Message copied to clipboard!One-shot mode with auto-copy:
gptsh --copy "Generate a docker command"
# Output is printed, then automatically copied to clipboard on exitFeatures:
- Automatic method selection: Uses native clipboard on local sessions, falls back to OSC52 over SSH
- OSC52 support: Works seamlessly over SSH to update your local clipboard
- Error handling: Silently continues if copy fails (doesn't interrupt workflow)
- Works with multimodal: Copies text content from the assistant message
You can also copy the last message from a previous session without continuing:
# List sessions to find the one you want
gptsh --list-sessions
# [0] abc123 2024-01-15 10:30 "Summarize docs" (default|gpt-4.1)
# Copy from most recent session
gptsh -s 0 --copy
Copied to clipboard (342 chars) via native
# Copy from specific session by ID
gptsh -s abc123 --copy
Copied to clipboard (1024 chars) via native
# Over SSH - clipboard will be updated on local machine via OSC52
ssh user@remote
gptsh -s 0 --copy
Copied to clipboard (500 chars) via osc52This is useful for quickly retrieving outputs from recent conversations without needing to re-run them or start a new REPL.
Use the /file command in REPL to attach files as context for the LLM. Files are added to the conversation history and available to the LLM in subsequent messages.
Text File Example:
gptsh -i
> /file README.md
File attached: README.md
> Now you know the project structure. What's the main purpose?Image Example:
gptsh -i
> /file screenshot.png
File attached: screenshot.png
> What issues do you see in this screenshot?
# LLM analyzes the image (if model supports vision)PDF Example:
gptsh -i
> /file documentation.pdf
File attached: documentation.pdf
> Summarize the key points from this document
# LLM reads and analyzes the PDF (if model supports PDF input)Audio Example:
gptsh -i
> /file meeting-notes.mp3
File attached: meeting-notes.mp3
> Transcribe and summarize this audio
# LLM transcribes or analyzes the audio (if model supports audio)Instructions Configuration:
Pre-load files automatically on REPL startup using the instructions config:
# ~/.config/gptsh/config.yml
instructions:
- README.md
- docs/architecture.md
agents:
developer:
instructions:
- AGENTS.md
- docs/coding-standards.mdFiles specified in instructions are loaded automatically when the REPL starts, providing context for all subsequent turns.
Features:
- Automatic type detection — Images, PDFs, audio, and text are handled appropriately
- Any file size — Even very large files (multiple MB) can be attached
- Multiple files — Attach multiple files in sequence; each is added to conversation history
- Persistent context — Files remain in history across turns in the same REPL session
- Instructions config — Pre-load files on startup for consistent context
- stdin — If available (e.g., from a pipe), non-interactive stdin is read and appended to the active prompt. Binary content (images, audio, PDFs) is auto-detected via magic bytes and injected as a concise marker. In REPL mode, stdin is then switched to /dev/tty to accept further interactive input.
- stderr — Progress bar, tool-approval prompts, and logs.
- stdout — Only LLM output is written to stdout.
This provides great flexibility and many possible uses in your shell session.
- 0 success
- 1 generic failure
- 2 configuration error (invalid/missing)
- 3 MCP connection/spawn failure (after retries)
- 4 tool approval denied
- 124 operation timeout
- 130 interrupted (Ctrl-C)
uv venv
UV_CACHE_DIR=.uv-cache uv pip install -e .[dev]Run:
UV_CACHE_DIR=.uv-cache uv run gptsh --helpRuff is configured as the primary linter in pyproject.toml (line length 100, isort enabled). Run lint + tests before committing:
UV_CACHE_DIR=.uv-cache uv run ruff check
UV_CACHE_DIR=.uv-cache uv run pytestProject scripts:
- Entry point: gptsh = "gptsh.cli.entrypoint:main"
- Keep code async; don’t log secrets; prefer uv/uvx for all dev commands.
For full development instructions, read AGENTS.md.
gptsh supports audio file processing through two complementary features:
Audio files can be automatically transcribed using OpenAI's Whisper API:
# Transcribe audio file
cat recording.mp3 | gptsh "Summarize the transcript"Configuration:
Transcription uses the provider system for credentials and API endpoints. Configure any LiteLLM provider to handle transcription:
providers:
openai:
model: gpt-4o
api_key: ${OPENAI_API_KEY} # Or set OPENAI_API_KEY env var
base_url: https://api.openai.com/v1
transcribe:
enabled: true # Explicitly enable or auto-enable if provider has API key
provider: openai # Reference the provider name
model: whisper-1 # Whisper model to use
language: null # Optional language hint (ISO-639-1, e.g., "en", "es")
max_file_size: 25000000 # 25 MB (OpenAI limit)
detect_non_speech: true # Filter music/noiseCustom Endpoint Example:
providers:
custom_whisper:
api_key: ${CUSTOM_API_KEY}
base_url: https://api.example.com # Custom Whisper-compatible endpoint
transcribe:
enabled: true
provider: custom_whisper
model: whisper-1Supported Formats:
- MP3, WAV, OGG, FLAC, M4A, AAC, WebM
Speech Detection:
- Automatically filters out music and noise
- Prevents sending irrelevant audio to the LLM
- Detects markers like
[MUSIC],[NOISE],[SILENCE]
If transcription is disabled or unavailable, audio files are sent directly to the LLM as multimodal content (when supported):
# Send audio directly to GPT-4o (no transcription)
cat recording.wav | gptsh --model gpt-4o "What did they say?"Models with Audio Support:
- ✅
gpt-4o— Native audio input support - ✅
gpt-4o-mini— Native audio input support (more affordable) - ✅
gpt-4-turbo— Native audio input support - ❌
gpt-4.1-mini— No audio support - ❌ Azure OpenAI — Limited audio support (depends on API version)
Priority Order:
- Transcription (if enabled and available)
- Multimodal audio (if model supports it)
- Text marker (fallback for unsupported models)
In the REPL, use /file to attach audio files:
gptsh -i
> /file recording.mp3
# Audio is automatically transcribed or sent as multimodal content- Audio not being processed: Ensure model supports audio (use
gpt-4oorgpt-4o-mini) - Transcription errors: Check that the configured provider has a valid API key and quota available
- Verify
transcribe.providerreferences an existing provider inproviders - Verify the provider has
api_keyandbase_urlconfigured
- Verify
- No tools found: check --mcp-servers path, server definitions, and network access.
- Stuck spinner: use --no-progress to disable UI or run with --debug for logs.
- Markdown output looks odd: try -o text to inspect raw content.
- Workflow orchestration: define runnable workflows composed of steps (shell/Python/agents), similar to invoking targets with simple task runners.
- SRE copilot focus: practical day-to-day automation with safe approvals and rich tool integrations.
For full roadmap see TODO.md
Feedback and contributions are welcome!

