A Model Context Protocol (MCP) server for intelligent handling of large files with smart chunking, navigation, and streaming capabilities.
- Smart Chunking - Automatically determines optimal chunk size based on file type
- Intelligent Navigation - Jump to specific lines with surrounding context
- Powerful Search - Regex support with context lines before/after matches
- File Analysis - Comprehensive metadata and statistical analysis
- Memory Efficient - Stream files of any size without loading into memory
- Performance Optimized - Built-in LRU caching for frequently accessed chunks
- Type Safe - Written in TypeScript with strict typing
- Cross-Platform - Works on Windows, macOS, and Linux
npm install -g @willianpinho/large-file-mcpOr use directly with npx:
npx @willianpinho/large-file-mcpAdd the MCP server using the CLI:
# Add for current project only (local scope)
claude mcp add --transport stdio --scope local large-file-mcp -- npx -y @willianpinho/large-file-mcp
# Add globally for all projects (user scope)
claude mcp add --transport stdio --scope user large-file-mcp -- npx -y @willianpinho/large-file-mcpVerify installation:
claude mcp list
claude mcp get large-file-mcpRemove if needed:
# Remove from local scope
claude mcp remove large-file-mcp -s local
# Remove from user scope
claude mcp remove large-file-mcp -s userMCP Scopes:
local- Available only in the current project directoryuser- Available globally for all projectsproject- Defined in.mcp.jsonfor team sharing
Add to your claude_desktop_config.json:
{
"mcpServers": {
"large-file": {
"command": "npx",
"args": ["-y", "@willianpinho/large-file-mcp"]
}
}
}Config file locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Restart Claude Desktop after editing.
Gemini:
{
"tools": [
{
"name": "large-file-mcp",
"command": "npx @willianpinho/large-file-mcp",
"protocol": "mcp"
}
]
}Once configured, you can use natural language to interact with large files:
Read the first chunk of /var/log/system.log
Find all ERROR messages in /var/log/app.log
Show me line 1234 of /code/app.ts with context
Get the structure of /data/sales.csv
Read a specific chunk of a large file with intelligent chunking.
Parameters:
filePath(required): Absolute path to the filechunkIndex(optional): Zero-based chunk index (default: 0)linesPerChunk(optional): Lines per chunk (auto-detected if not provided)includeLineNumbers(optional): Include line numbers (default: false)
Example:
{
"filePath": "/var/log/system.log",
"chunkIndex": 0,
"includeLineNumbers": true
}Search for patterns in large files with context.
Parameters:
filePath(required): Absolute path to the filepattern(required): Search patterncaseSensitive(optional): Case sensitive search (default: false)regex(optional): Use regex pattern (default: false)maxResults(optional): Maximum results (default: 100)contextBefore(optional): Context lines before match (default: 2)contextAfter(optional): Context lines after match (default: 2)
Example:
{
"filePath": "/var/log/error.log",
"pattern": "ERROR.*database",
"regex": true,
"maxResults": 50
}Analyze file structure and get comprehensive metadata.
Parameters:
filePath(required): Absolute path to the file
Returns: File metadata, line statistics, recommended chunk size, and sample lines.
Jump to a specific line with surrounding context.
Parameters:
filePath(required): Absolute path to the filelineNumber(required): Line number to navigate to (1-indexed)contextLines(optional): Context lines before/after (default: 5)
Get comprehensive statistical summary of a file.
Parameters:
filePath(required): Absolute path to the file
Returns: File metadata, line statistics, character statistics, and word count.
Stream a file in chunks for processing very large files.
Parameters:
filePath(required): Absolute path to the filechunkSize(optional): Chunk size in bytes (default: 64KB)startOffset(optional): Starting byte offset (default: 0)maxChunks(optional): Maximum chunks to return (default: 10)
The server intelligently detects and optimizes for:
- Text files (.txt) - 500 lines/chunk
- Log files (.log) - 500 lines/chunk
- Code files (.ts, .js, .py, .java, .cpp, .go, .rs, etc.) - 300 lines/chunk
- CSV files (.csv) - 1000 lines/chunk
- JSON files (.json) - 100 lines/chunk
- XML files (.xml) - 200 lines/chunk
- Markdown files (.md) - 500 lines/chunk
- Configuration files (.yml, .yaml, .sh, .bash) - 300 lines/chunk
Customize behavior using environment variables:
| Variable | Description | Default |
|---|---|---|
CHUNK_SIZE |
Default lines per chunk | 500 |
OVERLAP_LINES |
Overlap between chunks | 10 |
MAX_FILE_SIZE |
Maximum file size in bytes | 10GB |
CACHE_SIZE |
Cache size in bytes | 100MB |
CACHE_TTL |
Cache TTL in milliseconds | 5 minutes |
CACHE_ENABLED |
Enable/disable caching | true |
Example with custom settings (Claude Desktop):
{
"mcpServers": {
"large-file": {
"command": "npx",
"args": ["-y", "@willianpinho/large-file-mcp"],
"env": {
"CHUNK_SIZE": "1000",
"CACHE_ENABLED": "true"
}
}
}
}Example with custom settings (Claude Code CLI):
claude mcp add --transport stdio --scope user large-file-mcp \
--env CHUNK_SIZE=1000 \
--env CACHE_ENABLED=true \
-- npx -y @willianpinho/large-file-mcpAnalyze /var/log/nginx/access.log and find all 404 errors
The AI will use the search tool to find patterns and provide context around each match.
Find all function definitions in /project/src/main.py
Uses regex search to locate function definitions with surrounding code context.
Show me the structure of /data/sales.csv
Returns metadata, line count, sample rows, and recommended chunk size.
Stream the first 100MB of /data/huge_dataset.json
Uses streaming mode to handle very large files efficiently.
- LRU Cache with configurable size (default 100MB)
- TTL-based expiration (default 5 minutes)
- 80-90% hit rate for repeated access
- Significant performance improvement for frequently accessed files
- Streaming architecture - files are read line-by-line, never fully loaded
- Configurable chunk sizes - adjust based on your use case
- Smart buffering - minimal memory footprint for search operations
| File Size | Operation Time | Method |
|---|---|---|
| < 1MB | < 100ms | Direct read |
| 1-100MB | < 500ms | Streaming |
| 100MB-1GB | 1-3s | Streaming + cache |
| > 1GB | Progressive | AsyncGenerator |
git clone https://github.com/willianpinho/large-file-mcp.git
cd large-file-mcp
npm install
npm run buildnpm run dev # Watch mode
npm run lint # Run linter
npm start # Run serversrc/
├── index.ts # Entry point
├── server.ts # MCP server implementation
├── fileHandler.ts # Core file handling logic
├── cacheManager.ts # Caching implementation
└── types.ts # TypeScript type definitions
Ensure the file path is absolute and the file has read permissions:
chmod +r /path/to/file- Reduce
CHUNK_SIZEenvironment variable - Disable cache with
CACHE_ENABLED=false - Use
stream_large_filefor very large files
- Reduce
maxResultsparameter - Use
startLineandendLineto limit search range - Ensure caching is enabled
Check if the server is installed:
claude mcp listIf not listed, reinstall:
claude mcp add --transport stdio --scope user large-file-mcp -- npx -y @willianpinho/large-file-mcpCheck server health:
claude mcp get large-file-mcpContributions are welcome! Please feel free to submit issues or pull requests.
- Fork the repository
- Create a feature branch
- Make your changes
- Ensure code builds and lints successfully
- Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
MIT
- Issues: GitHub Issues
- Documentation: This README and inline code documentation
- Examples: Check the
examples/directory
Built with the Model Context Protocol SDK.
Made for the AI developer community.