Skip to content

Conversation

@steinitzu
Copy link
Contributor

Description

Continued from #1436 without fork

Related Issues

  • Fixes #...
  • Closes #...
  • Resolves #...

Additional Context

@netlify
Copy link

netlify bot commented Jun 5, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit fda4608
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6668397765a9f100086cc878

@steinitzu steinitzu changed the base branch from devel to master June 5, 2024 12:51
@steinitzu steinitzu force-pushed the sthor/fix-databricks-pandas-error branch from e3c5d68 to b22f4aa Compare June 5, 2024 12:52
@steinitzu steinitzu changed the base branch from master to devel June 5, 2024 13:41
@steinitzu steinitzu force-pushed the sthor/fix-databricks-pandas-error branch 4 times, most recently from 4cfbb87 to 06de0b3 Compare June 5, 2024 13:47
@steinitzu steinitzu force-pushed the sthor/fix-databricks-pandas-error branch from 06de0b3 to 41a8a6f Compare June 5, 2024 13:51
@steinitzu steinitzu marked this pull request as ready for review June 5, 2024 13:51
@rudolfix rudolfix added the bug Something isn't working label Jun 5, 2024
super().__init__(schema, config, sql_client)
self.config: DatabricksClientConfiguration = config
self.sql_client: DatabricksSqlClient = sql_client
self.sql_client: DatabricksSqlClient = sql_client # type: ignore[assignment]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did it work before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea tbh. Maybe something to do with 2.9 driver not having type hints? 3.x does.

def open_connection(self) -> DatabricksSqlConnection:
conn_params = self.credentials.to_connector_params()
self._conn = databricks_lib.connect(**conn_params, schema=self.dataset_name)
self._conn = databricks_lib.connect(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my only issue here: could we support 2.9 driver? the 3.x has

paramstyle = "named"

and 2.9 pyformat so it is trivial to distinguish them
why: I bet there are people with constrained environments (ie Airflow) where upgrading dependencies is hard. and 3.0 was released only in march

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.9 still works if you have it installed. There's no change in our code except this argument that's ignored in 2.9.
paramstyle doesn't matter with use_inline_params, this makes 3.x work like 2.x which does not use parametrized queries at all, it's some custom escaping and %s string formatting under the hood. 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to switch to named paramstyle we can detect driver version and cluster version. But I don't know enough about it to tell if this is reliable in all environments: https://docs.databricks.com/en/sql/language-manual/functions/current_version.html

pyproject.toml Outdated
qdrant-client = {version = "^1.6.4", optional = true, extras = ["fastembed"]}
databricks-sql-connector = {version = ">=2.9.3,<3.0.0", optional = true}
dbt-databricks = {version = ">=1.7.3", optional = true}
databricks-sql-connector = {version = "^3", optional = true}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this is good

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I cleaned up dbt stuff (moved from extras to dependency group), allowed databricks > 2.9.3 bumped dependency to >3.

all seems to work together

@rudolfix rudolfix merged commit d4340d8 into devel Jun 11, 2024
@rudolfix rudolfix deleted the sthor/fix-databricks-pandas-error branch June 11, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants