-
Notifications
You must be signed in to change notification settings - Fork 15.9k
feat(mcp): MCP Service POC - Phase 1 #33976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(mcp): MCP Service POC - Phase 1 #33976
Conversation
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #33976 +/- ##
===========================================
+ Coverage 0 70.66% +70.66%
===========================================
Files 0 592 +592
Lines 0 42601 +42601
Branches 0 4446 +4446
===========================================
+ Hits 0 30106 +30106
- Misses 0 11351 +11351
- Partials 0 1144 +1144
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
69ecee7
to
2705bfc
Compare
dff2f3a
to
ffcb221
Compare
|
||
**Inputs:** | ||
- `filters`: `Optional[List[DashboardFilter]]` — List of filter objects | ||
- `columns`: `Optional[List[str]]` — Columns to include in the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the LLM good at choosing what columns will be required? Would returning a set fixed of columns that cover most relevant use cases be a better option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach we are going is the default response is limited to a set of columns that are set as default in the tool by us. Perhaps a prompt tool can help figure out to augment these columns if needed based on the user's need?
One thing I ran into is that its not possible to get a pydantic schema used by an mcp server to only include the attributes that are non null - at least i am still working on figuring that out. pydantic/pydantic#5461
datasource_type: Literal["table"] = Field("table", description="Datasource type (usually 'table')") | ||
metrics: List[str] = Field(..., description="List of metric names to display") | ||
dimensions: List[str] = Field(..., description="List of dimension (column) names to group by") | ||
filters: Optional[List[Dict[str, Any]]] = Field(None, description="List of filter objects (column, operator, value)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can get pretty complex. Do we have a sense of how good the LLM is good with these without additional prompting techniques?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems given the pydantic schemas kept as static as possible using concrete types instead of dicts and string where possible, that it does figure things out well but i did notice a couple of instances where it added extra attributes that didnt exist, received an error and then corrected it.
I gave it "list all dashboards with z at the end the name " and it ran this:
{
"filters": [
{
"col": "dashboard_title",
"opr": "ilike",
"value": "%z"
}
],
"select_columns": [
"id",
"dashboard_title"
],
"order_column": "dashboard_title",
"order_direction": "asc",
"page": 1,
"page_size": 100
}
"list all datasets related to population"
{
"filters": [
{
"col": "table_name",
"opr": "ilike",
"value": "%population%"
}
],
"select_columns": [
"id",
"table_name"
],
"order_column": "table_name",
"order_direction": "asc",
"page": 1,
"page_size": 100
}


42f23de
to
ddf7f49
Compare
superset/mcp_service/mcp_app.py
Outdated
|
||
mcp = FastMCP( | ||
"Superset MCP Server", | ||
instructions=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we hook auth for example: BearerAuthProvider
? is it possible to use add_middleware
later on init_fastmcp_server
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question about auth integration!
Yes, our MCP service architecture is designed to support BearerAuthProvider integration. Looking at the FastMCP Bearer auth docs, we can integrate it in two ways:
Option 1: Server initialization (cleanest approach)
auth = BearerAuthProvider(
jwks_uri=os.getenv("MCP_JWKS_URI"),
issuer=os.getenv("MCP_JWT_ISSUER"),
audience="superset-mcp-server"
)
mcp = FastMCP("Superset MCP Server", auth=auth)
Option 2: Environment-based configuration
Since our server is already modular with middleware support, we can add auth as an optional feature
controlled by environment variables, making it easy to enable/disable per deployment.
The BearerAuthProvider supports JWT validation via JWKS endpoints, which aligns well with enterprise
SSO systems. We'd get access to user context via get_access_token() in our tools for fine-grained
permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like option 1, can you make it configurable also? similar to: https://github.com/apache/superset/blob/master/superset/config.py#L1526.
Actually, we can't make it exactly like that since __init__
can take different parameters, WDYT about using a configurable factory function?
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
issuer = app.config["MCP_JWT_ISSUER"]
audience = app.config["MCP_JWT_AUDIENCE"]
return BearerAuthProvider(
jwks_uri=jwks_uri,
issuer=issuer,
audience=audience
)
config.py
MCP_AUTH_FACTORY: Callable[[Flask], Any] = create_auth
Then:
mcp = FastMCP("Superset MCP Server", auth=app.config["MCP_AUTH_FACTORY"](app))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar Thanks for the excellent suggestion! I've implemented the configurable factory pattern as you outlined.
The MCP service now uses app.config["MCP_AUTH_FACTORY"](app)
to create the auth provider, following the same pattern as MACHINE_AUTH_PROVIDER_CLASS.
Implementation details:
- Default factory in superset/mcp_service/config.py:
def create_default_mcp_auth_factory(app: Flask) -> Optional[Any]:
"""Default MCP auth factory that uses app.config values."""
if not app.config.get("MCP_AUTH_ENABLED", False):
return None
jwks_uri = app.config.get("MCP_JWKS_URI")
# ... create and return BearerAuthProvider
- Usage in the MCP service initialization:
auth_factory = app.config.get("MCP_AUTH_FACTORY")
if auth_factory and callable(auth_factory):
return auth_factory(app)
- User configuration in superset_config.py:
# Simple approach - just set values
MCP_AUTH_ENABLED = True
MCP_JWKS_URI = "https://your-provider.com/.well-known/jwks.json"
# Or provide custom factory
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
return BearerAuthProvider(jwks_uri=jwks_uri, ...)
MCP_AUTH_FACTORY = create_auth
superset/mcp_service/auth.py
Outdated
def wrapper(*args: Any, **kwargs: Any) -> Any: | ||
# --- Setup user context (was _setup_user_context) --- | ||
admin_username = current_app.config.get("MCP_ADMIN_USERNAME", "admin") | ||
admin_user = security_manager.get_user_by_username(admin_username) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, by default full admin is given as a sudoer
also, no auth process is inplace, for example JWT verification, using JWT claims to get user info.
Can you give an example on how can this be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent observation! You're absolutely right - the current implementation hardcodes admin access without proper JWT verification. Here's how we can implement proper JWT-based authentication with user claims:
Option 1: FastMCP Bearer Token Integration
def get_user_from_request() -> Any:
"""Extract user from JWT token claims via FastMCP's BearerAuthProvider."""
from fastmcp.auth import get_access_token
from superset.extensions import security_manager
try:
# Get validated JWT token from FastMCP auth
access_token = get_access_token()
# Extract user identifier from JWT claims
username = access_token.subject # or access_token.client_id
user_email = access_token.claims.get("email")
# Look up actual Superset user
user = security_manager.get_user_by_username(username)
if not user and user_email:
user = security_manager.get_user_by_email(user_email)
return user or AnonymousUserMixin()
except Exception:
# Fallback to anonymous if no valid token
return AnonymousUserMixin()
Option 2: Enhanced RBAC with Scope-Based Permissions
def has_permission(user: Any, tool_func: Any) -> bool:
"""Check permissions using JWT scopes + Superset RBAC."""
from fastmcp.auth import get_access_token
try:
access_token = get_access_token()
user_scopes = access_token.scopes
# Map tool functions to required scopes
required_scopes = {
'list_dashboards': ['dashboard:read'],
'create_chart': ['chart:write'],
'get_dataset_info': ['dataset:read']
}
tool_name = tool_func.__name__
if required := required_scopes.get(tool_name):
if not any(scope in user_scopes for scope in required):
return False
# Also check Superset's native RBAC
return user and hasattr(user, 'is_active') and user.is_active
except Exception:
# No token = anonymous user, check Superset perms only
return user and hasattr(user, 'is_active') and user.is_active
Updated Wrapper Implementation
@functools.wraps(tool_func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
# Get authenticated user from JWT (replaces hardcoded admin)
user = get_user_from_request()
# Set Flask context with actual authenticated user
g.user = user
# Apply impersonation if requested and allowed
if run_as := kwargs.get("run_as"):
user = impersonate_user(user, run_as)
# Check both JWT scopes and Superset RBAC
if not has_permission(user, tool_func):
raise PermissionError(
f"User {getattr(user, 'username', 'anonymous')} lacks permission for
{tool_func.__name__}"
)
# Enhanced audit logging with JWT context
log_access(user, tool_func.__name__, args, kwargs)
return tool_func(*args, **kwargs)
This approach removes the hardcoded admin escalation and uses actual JWT-validated user identity with
proper scope-based authorization!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are valid approaches.
Superset auth is integrated with multiple providers and scenarios, so it's important to guarantee that this is configurable and flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar Thanks for the feedback on authentication flexibility! I've already implemented the
JWT-based authentication as discussed, and now made it fully configurable as you suggested.
What's implemented:
- JWT User Authentication - No more hardcoded admin:
- Extracts user from JWT token claims via FastMCP's get_access_token()
- Maps JWT identity to actual Superset users in the database
- Configurable user resolver for different JWT claim structures - Scope-Based Permissions:
- JWT scopes mapped to tool permissions (e.g., dashboard:read, chart:write)
- Falls back gracefully when no JWT is present (dev mode)
- Enhanced audit logging with JWT context - Full Configurability via superset_config.py:
Configure auth factory
MCP_AUTH_FACTORY = my_custom_auth_factory
Configure how to extract username from JWT
def custom_user_resolver(access_token):
# Handle your specific JWT structure
return access_token.payload.get('preferred_username')
MCP_USER_RESOLVER = custom_user_resolver
Configure scopes, fallback users, etc.
MCP_REQUIRED_SCOPES = ["superset:read", "superset:admin"]
MCP_DEV_USERNAME = "dev_user" # For local development
- Flexible Integration:
- Works with any JWT provider (Auth0, Keycloak, Okta, etc.)
- Supports both JWKS and direct public key validation
- Compatible with Superset's existing auth providers
aa4d7d6
to
01a0f8b
Compare
839bfed
to
1a100f1
Compare
Would ❤️ instructions as to how to set this up locally to try it, maybe in the PR body, or better, in docs, maybe under |
oh also if/when you rebase I could add a new service in the new |
5f86802
to
714e21b
Compare
…ation BaseDAO Enhancements: - Add find_by_id() and find_by_ids() methods with flexible column support (id, uuid, slug) - Add list() and count() methods with column operators for advanced filtering - Implement ColumnOperatorEnum with 14 operators (eq, ne, gt, lt, like, in, etc.) - Add get_filterable_columns_and_operators() for dynamic column introspection - Support UUID column type conversion and error handling Code Organization: - Move imports to module level (sqlalchemy as sa, hybrid_property) - Extract TYPE_OPERATOR_MAP as module-level constant for reusability - Add custom field mappings as module constants in Chart, Dashboard, Dataset DAOs - Follow Superset conventions for import organization Test Reorganization: - Separate 52 integration tests from 6 unit tests for better organization - Move database-dependent tests to tests/integration_tests/dao/ - Keep only mock-based tests in tests/unit_tests/dao/ - Add proper fixtures for integration tests with minimal database setup - Fix app context and Flask g.user mocking issues Dependencies: - Add pytest-mock to development requirements for better test fixtures This provides a robust foundation for flexible data access patterns across Superset.
…nships Key improvements: - Convert operator_map from method to module-level dict for better performance - Fix pagination count inflation when loading many-to-many relationships - Add type consistency validation for find_by_ids method - Replace f-strings in logging with % formatting for performance - Remove invalid 'favorite' field from dashboard custom fields - Improve test assertions with exact SQL fragment validation The pagination fix ensures that SQL JOINs from relationship loading don't affect the total count, which was causing incorrect pagination behavior.
…pendencies - Remove all Anthropic API key related code from mcp.py CLI - Delete redundant requirements-mcp.txt (deps managed in pyproject.toml) - Remove unused flask_appbuilder_compat.py module - Add comprehensive test suite for MCP CLI with 100% coverage - Fix User import to use flask_appbuilder.security.sqla.models - Update tests to handle Flask context properly - Fix mypy type errors and code quality issues
- Implement BoundedScreenshotCache with LRU eviction and TTL - Add RedisScreenshotCache for distributed deployments - Create factory function that uses Redis if available, falls back to in-memory - Replace unbounded dict with proper cache interface - Fix logging to use % formatting instead of f-strings for performance - Add thread-safety with locks for concurrent access The cache now: - Limits memory usage to max 100 screenshots - Automatically expires entries after 5 minutes - Uses LRU eviction when at capacity - Works in both development (in-memory) and production (Redis)
- Fix logger.info call that was using % formatting instead of comma separation - Logger methods should pass parameters as separate arguments, not format strings - This allows lazy evaluation only when log level is active
…rvice Major improvements to address critical issues found in code review: **Memory Management & Caching:** - Implement hybrid screenshot cache with Redis for production, in-memory for dev - Add BoundedScreenshotCache with LRU eviction, TTL (5min), and size limits (100 items) - Add RedisScreenshotCache for distributed deployments - Prevent memory leaks with proper cache cleanup and bounded growth **Rate Limiting:** - Implement hybrid rate limiting with Redis for production environments - Add InMemoryRateLimiter with sliding window for development - Add RedisRateLimiter for distributed rate limiting across instances - Support per-user and per-tool rate limit configurations **Performance & Architecture:** - Implement singleton pattern for Flask app creation - Prevent multiple Flask app instances with thread-safe singleton - Fix critical logging performance issues by replacing f-strings with % formatting - Reduce app initialization overhead **Security & Reliability:** - Fix SQL injection risks in logging statements - Add proper error handling in authentication provider creation - Improve error messages and logging throughout **Technical Details:** - Both caches use factory pattern to select Redis or in-memory based on availability - Rate limiter uses sliding window algorithm for accurate request tracking - All implementations are thread-safe with proper locking - Flask singleton ensures consistent configuration across MCP service This makes the MCP service production-ready with proper resource management, distributed system support, and improved performance.
- Add ctx: Context parameter to all 20 MCP tools for enhanced observability - Implement comprehensive Context usage including: - Structured logging with ctx.info(), ctx.debug(), ctx.warning(), ctx.error() - Progress reporting with ctx.report_progress() for long operations - Proper error handling with Context error logging - Convert all f-string logging to % formatting for security and performance - Fix all 151 MCP unit tests to work with Context enhancements - Update test fixtures and mocks to support Context parameters - Fix schema field references (schema vs schema_name) in execute_sql - Make async functions where needed for Context compatibility This change improves observability, debugging, and monitoring capabilities for all MCP tools while maintaining backward compatibility.
- Fix mock decorator parameter order in chart tool tests (11 tests) - Add async/await support for dataset tool tests (2 tests) - Update auth test expectations for new security restrictions (2 tests) - Ensure proper Context parameter mocking throughout test suite - Fix mypy type annotations and ruff formatting issues - All 306 MCP service unit tests now pass (301 passed + 5 expected failures) Specific fixes: - Corrected decorator-to-parameter mapping in generate_chart tests - Added @pytest.mark.asyncio and await for async Context methods - Updated impersonation and JWT validation test expectations - Fixed Context method mocking as AsyncMock where needed - Added type annotations for _timeout_handler and _has_dataset_access - Fixed list sanitization in error_builder.py - Renamed WebDriverCreationTimeout to WebDriverCreationError - Added proper exception logging and raise from None - Shortened long lines for ruff compliance
…late - Move chart/error_handling/error_builder.py to utils/error_builder.py for better reusability - Update all import statements across validation and tool modules - Remove empty error_handling directory - Add missing "generation_failed" error template to fix generic "An error occurred" messages - Template now provides specific debugging info for chart generation failures This fixes critical bug where XY chart generation errors showed unhelpful generic messages. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Merge sql_lab/tool/ files into single tools.py module - Rename mcp_app.py to app.py for standard naming - Consolidate config examples into single file - Remove redundant directory structure - Update all import references across codebase
- Fixed import paths from mcp_app to app module across all test files - Fixed sql_lab.tool import to sql_lab.tools in app.py - Added missing options object to WebDriver kwargs in webdriver.py - All MCP service unit tests now passing (301 passed, 5 xfailed) - All MCP CLI integration tests passing (27 passed)
…ig.py - Added pytest fixtures to use temp directories for all config file operations - Tests now use tmp_path fixture instead of writing to project root - Fixes critical bug where tests would destroy user's actual superset_config.py - All tests now properly isolated and cannot affect real configuration files
- Add Flask blueprint for proxying MCP requests to FastMCP service - Implement circuit breaker and rate limiting for resilience - Support SSE streaming for MCP protocol - Add configuration module for MCP settings - Include test scripts for direct and proxy validation - Enable multi-tenant context injection via headers
- Add clear warning that file is not auto-loaded - Document minimum required settings (only 3 needed) - List all available settings with defaults - Add quick start snippet for easy setup - Include examples for different environments
- Add /api/v1/mcp-native/ endpoint for direct MCP service integration - Eliminates subprocess overhead by using fastmcp.Client directly - Both HTTP and native endpoints support full MCP protocol (20+ tools) - Update initialization to register native proxy blueprint - Add comprehensive test suite for native proxy endpoint - Include Claude Code integration script for native proxy - Clean up unused files and update existing proxy structure The native approach provides better performance and reliability compared to HTTP proxy while maintaining protocol compatibility.
Rename HTTP and native proxy endpoints: - /api/v1/mcp/ → /api/v1/mcp-proxy/ - /api/v1/mcp-native/ → /api/v1/mcp-client/ Update related files: - mcp_native_proxy.py → mcp_client_proxy.py - test_mcp_native_proxy.sh → test_mcp_client_proxy.sh - run_proxy_to_mcp_native.sh → run_proxy_to_mcp_client.sh - simple_proxy_to_mcp_native.py → simple_proxy_to_mcp_client.py Update all route decorators, blueprint names, imports, and test scripts
- Rename mcp_proxy.py → mcp_http_proxy.py for clarity - Update all imports and blueprint references - Simplify config_mcp.py for Preset-hosted environment: - Remove unnecessary settings handled by infrastructure - Keep only essential MCP connection settings - Remove CORS, auth, and advanced multi-tenant configs - Update related shell scripts and proxy files
Co-authored-by: bito-code-review[bot] <188872107+bito-code-review[bot]@users.noreply.github.com>
- Bump fastmcp version to match working environment - Add commented staging URL for future reference
80712b0
to
dd4e92a
Compare
PR Summary: MCP Service Implementation
SUMMARY
This PR implements Phase 1 of the Model Context Protocol (MCP) service for Apache Superset, as outlined in SIP-171. The MCP service provides programmatic access to Superset dashboards, charts, datasets, and instance metadata via standardized tools and schemas.
Available Tools (20 total):
list_dashboards
,get_dashboard_info
,get_dashboard_available_filters
,generate_dashboard
,add_chart_to_existing_dashboard
list_datasets
,get_dataset_info
,get_dataset_available_filters
list_charts
,get_chart_info
,get_chart_available_filters
,generate_chart
,update_chart
,update_chart_preview
,get_chart_data
,get_chart_preview
generate_explore_link
get_superset_instance_info
open_sql_lab_with_context
,execute_sql
Key Features:
streamable-http
transportBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Development Environment Setup:
git checkout mcp_service_amin_dev make mcp-setup # Complete setup (deps, DB, frontend)
Start Services:
make mcp-run # Starts Superset if needed, then MCP service
Health Check:
make mcp-check # Verify configuration and services
MCP Client Configuration:
Desktop App:
Add to
~/Library/Application Support/Claude/claude_desktop_config.json
:Web Interface:
http://localhost:5008
(Streamable HTTP)MCP Inspector:
Testing:
Cleanup:
make mcp-stop # Stop background processes
ADDITIONAL INFORMATION