Description
Bug description
It would be ideal that the BigQuery support allowed for a separation between GCP projects. Often the datasets live in one project which is notionally owned by a different team. Allowing for this provides:
- better isolation of concerns and therefore a better understanding of security roles and permissions
- a clear separation of usage where the visualisation cost of superset (namely BQ slots) can be clearly recognised and budgeted for
- simplified incident resolution in case of 'slowness' where BQ backend usage can be correlated to activity on a given superset instance
IMPLEMENTATION
So this would effectively need to formalise the separation between
- project_id - the project in which the data is held
- job_project_id - the project in which queries are run, where the slot allocation is used
From an implementation perspective this would require:
- all BQ queries to be fully qualified like
select * from {project}.{dataset}.{table}
- all non-query usage to be able to differentiate when you are making queries (default project job_project_id) and when you are querying metadata (default project project_id)
From experimentation it seems that [1] is relatively easy to solve. If you create a dataset with a schema of project.dataset
instead of just dataset
then any chart that uses it will work correctly. However, this does not work when building a dataset through the UI, but a little use of SQLA_TABLE_MUTATOR
seems to work (a shortened version minus error handling of what I've used is below):
def SQLA_TABLE_MUTATOR(table):
if table.database.driver == 'bigquery':
table.schema = f'{table.catalog}.{table.schema}'
return table
However point [2] is a harder nut to crack. I've been experimenting with DB_CONNECTION_MUTATOR
to achieve this but not sure this will be reliable enough. The other alternative would be a small monkey patch within superset/db_engine_specs/bigquery.py
It would be great if this logic was available out of the box.
Superset version
master / latest-dev
Python version
3.9
Node version
16
Checklist
- I have searched Superset docs and Slack and didn't find a solution to my problem.
- I have searched the GitHub issue tracker and didn't find a similar bug report.
- I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.