Skip to content

Drop redundant project id null checks. #8145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2020

Conversation

jmcarp
Copy link
Contributor

@jmcarp jmcarp commented Apr 4, 2020

The fallback_to_default_project_id method already raises an exception
if both self.project_id and kwargs["project_id"] are missing, so
skip redundant null checks in hook methods.


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Apr 4, 2020
@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from 01499ea to 37c79f6 Compare April 4, 2020 22:46
@turbaszek
Copy link
Member

@jmcarp those checks are only to please mypy. Initially, we used assert project_id but since we don't use assert in production code we have if clause...

@potiuk
Copy link
Member

potiuk commented Apr 5, 2020

Just thinking - maybe we can do it differently and find a way to both - keep mypy happy and remove the ifs... Let me think about it a bit :)

@turbaszek
Copy link
Member

@potiuk I think @mik-laj already gave it a lot of thought

@potiuk
Copy link
Member

potiuk commented Apr 5, 2020

Some more would not hurt :)

@turbaszek
Copy link
Member

Sure, would be awesome to remove it. But typehints with decorators are tricky :)

@potiuk
Copy link
Member

potiuk commented Apr 5, 2020

Hold my 🍺 and 🐱

@turbaszek
Copy link
Member

Hold my 🍺 and 🐱

Are you going to propose PEP to make Python statically typed language? 😅

@potiuk
Copy link
Member

potiuk commented Apr 5, 2020

Yep. That's my secret plan. Just started to write it :)

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from 37c79f6 to 325f379 Compare April 5, 2020 17:28
@jmcarp
Copy link
Contributor Author

jmcarp commented Apr 5, 2020

I added a simple mypy plugin to preserve types for decorated functions, and to update types when decorators change function signatures. For example, fallback_to_default_project_id effectively changes the type of the project_id argument from str to Optional[str]. This also fixes the problem of decorators changing the types of all decorated callables to Callable[..., Any]. Still a WIP, but what do you think?

By the way, mypy might support this kind of behavior natively at some point, but this issue has been open for a long time: python/mypy#3157.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 2 times, most recently from 8b41a5a to 988afe8 Compare April 5, 2020 18:27
@turbaszek
Copy link
Member

I have mixed feelings... First, I am not a big fan of adding more and more custom plugins (especially to solve problems that should be solved on language level #static_types_fan) . Especially to solve problems that I am not even sure are real problems. Since we have the if clause I even started to like the "explicit" information about project_id ("at this point, it will be provided by decorator").

On the other hand, this change does no harm. It makes our codebase bigger and adds next moving part to it. So, I am not sure.

I am curious about what others think? @mik-laj @kaxil @ashb @potiuk

@ashb
Copy link
Member

ashb commented Apr 5, 2020

Quick thought: The same issue about project_id also applies to session/provide_session

@jmcarp
Copy link
Contributor Author

jmcarp commented Apr 5, 2020

I was also hesitant about writing a mypy plugin, even a simple one, but as far as I can tell, it's the only way to write decorators that don't produce callables with type Callable[..., Any]. I also think it's not ideal that all decorated functions in the airflow codebase have that type. For example, users might see that PubSubHook.publish has type annotations and expect that mypy will catch type errors. But that's not the case, since the fallback_to_default_project_id decorator doesn't preserve function types. Adding the mypy plugin also caused the type checker to find some type errors in the codebase that it couldn't catch previously, again because decorated functions are effectively untyped.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 7 times, most recently from aae4fb9 to e5ece17 Compare April 5, 2020 22:24
@jmcarp
Copy link
Contributor Author

jmcarp commented Apr 5, 2020

One more thing: one of the mypy developers suggests using mypy plugins to preserve types for decorated functions until mypy supports this kind of thing: python/mypy#1927 (comment).

@turbaszek
Copy link
Member

Quick thought: The same issue about project_id also applies to session/provide_session

As well as any other decorator that will inject arguments. @jmcarp will we be able to somehow check if all decorators that require it use this plugin? project_id is only GCP stuff an probably not so much problematic but provide_session is common and its case can be worth considering.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 4 times, most recently from 4eced5c to 1161ea6 Compare April 6, 2020 00:19
@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 3 times, most recently from 6f29451 to c0983b4 Compare April 6, 2020 16:49
@jmcarp
Copy link
Contributor Author

jmcarp commented Apr 6, 2020

I don't know if we can programmatically audit use of the plugin, but the number of decorators is small and shouldn't change often, and it's simple enough to check by hand by grepping for wraps. I can add more decorators to the plugin here, but this patch is already pretty big, so I was thinking of adding provide_session and other decorators in a follow-up if we accept this change.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from c0983b4 to 0f6020a Compare April 6, 2020 19:23
@@ -139,7 +139,6 @@ def get_target_column_spec(columns_specs: List[Dict], column_name: str) -> str:
task_id="update_dataset_task",
dataset=update,
location=GCP_AUTOML_LOCATION,
project_id=GCP_PROJECT_ID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think we should leave that in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter was unused in CloudAutoMLHook.update_dataset, so I dropped the parameter from that method, as well as AutoMLTablesUpdateDatasetOperator and the example code here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I thik it's good :)

@@ -54,7 +54,7 @@ def _get_client(self, project_id: str):
return self._client

@GoogleBaseHook.fallback_to_default_project_id
def get_instance(self, instance_id: str, project_id: Optional[str] = None) -> Instance:
def get_instance(self, instance_id: str, project_id: str) -> Instance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decide what we want to use: str or Optional[str] = None and be cosistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Agree. project_id: str is better IMHO. it is semantically correct - we expect project_id to be set. And it is used most of the time below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcarp I checked the changes and in some places where we use the decorator we are using project_id: str and in others project_id: Optional[str] = None. I think we should be consistent :)

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small thigs. I like it :)

@@ -139,7 +139,6 @@ def get_target_column_spec(columns_specs: List[Dict], column_name: str) -> str:
task_id="update_dataset_task",
dataset=update,
location=GCP_AUTOML_LOCATION,
project_id=GCP_PROJECT_ID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think we should leave that in.

@@ -54,7 +54,7 @@ def _get_client(self, project_id: str):
return self._client

@GoogleBaseHook.fallback_to_default_project_id
def get_instance(self, instance_id: str, project_id: Optional[str] = None) -> Instance:
def get_instance(self, instance_id: str, project_id: str) -> Instance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Agree. project_id: str is better IMHO. it is semantically correct - we expect project_id to be set. And it is used most of the time below.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from 0f6020a to e0f61d7 Compare April 11, 2020 01:06
@mik-laj
Copy link
Member

mik-laj commented Apr 11, 2020

If you would like to add support for more decorators, the most popular decorator is `apply_defaullts. However, this will be best done in a separate PR. This is very problematic and has made it difficult for me to introduce metaclasses - #7450

@potiuk
Copy link
Member

potiuk commented Apr 11, 2020

@turbaszek @mik-laj -> looks good to me. Can you take another look?

@codecov-io
Copy link

codecov-io commented Apr 11, 2020

Codecov Report

Merging #8145 into master will decrease coverage by 0.40%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8145      +/-   ##
==========================================
- Coverage   88.33%   87.93%   -0.41%     
==========================================
  Files         936      937       +1     
  Lines       45321    45197     -124     
==========================================
- Hits        40036    39743     -293     
- Misses       5285     5454     +169     
Impacted Files Coverage Δ
airflow/mypy/plugin/decorators.py 0.00% <0.00%> (ø)
...google/cloud/example_dags/example_automl_tables.py 92.72% <ø> (ø)
airflow/providers/google/cloud/hooks/automl.py 98.94% <ø> (+10.42%) ⬆️
...rflow/providers/google/cloud/hooks/bigquery_dts.py 86.00% <ø> (+6.68%) ⬆️
.../providers/google/cloud/hooks/cloud_memorystore.py 74.07% <ø> (ø)
...irflow/providers/google/cloud/hooks/datacatalog.py 91.74% <ø> (ø)
airflow/providers/google/cloud/hooks/datafusion.py 84.53% <ø> (+3.22%) ⬆️
airflow/providers/google/cloud/hooks/dataproc.py 98.60% <ø> (ø)
airflow/providers/google/cloud/hooks/dlp.py 98.55% <ø> (ø)
airflow/providers/google/cloud/hooks/mlengine.py 82.74% <ø> (+1.27%) ⬆️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77412ac...f33e29e. Read the comment docs.

@turbaszek
Copy link
Member

I still have concerns about type annotations consistency:
#8145 (comment)

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from e0f61d7 to f8d4c4b Compare April 11, 2020 14:51
@boring-cyborg boring-cyborg bot added the k8s label Apr 11, 2020
Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 3 times, most recently from fde9de9 to 81e549a Compare April 11, 2020 20:07
@jmcarp
Copy link
Contributor Author

jmcarp commented Apr 11, 2020

Thanks for reviewing. Is it all right to merge while TestCliTaskBackfill.test_run_ignores_all_dependencies is failing? It looks unrelated to this patch.

@jmcarp jmcarp force-pushed the drop-redundant-project-check branch 2 times, most recently from e032024 to 535efb5 Compare April 12, 2020 21:16
* Preserve types for decorators that don't change signatures
* Augment types for decorators that change signatures
* Drop redundant type checks
@jmcarp jmcarp force-pushed the drop-redundant-project-check branch from 535efb5 to f33e29e Compare April 12, 2020 21:39
@potiuk potiuk merged commit 1fd9ed3 into apache:master Apr 13, 2020
@potiuk
Copy link
Member

potiuk commented Apr 13, 2020

Thanks @jmcarp! Great change!

@zachliu
Copy link
Contributor

zachliu commented Jun 22, 2020

@jmcarp do you plan to work on apply_defaults? 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants