Skip to content

Add PYTEST_DONT_REWRITE to type_codes module #1142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

potiuk
Copy link

@potiuk potiuk commented Mar 27, 2023

Solves the problem that collecting on running test without exclusion on Pytest test name, importing cassandra driver causes a very strangely looking error:

from cassandra.cluster import Cluster
cassandra/cluster.py:48: in init cassandra.cluster ???
cassandra/connection.py:40: in init cassandra.connection ???
cassandra/protocol.py:698: in genexpr
???
cassandra/protocol.py:698: in genexpr
???
E KeyError: '@py_builtins'

This happened for Apache Airflow when we wanted to enable pytest collection to include all files, rather than only "test_*.py" files, because we found that there were at least few modules in airflow that did not start with "test".

It turned out that the culprit was Assert Rewrite done by pytest that tried to rewrite "type_codes.py" module of Cassandra.

One of the ways to avoid the problem is to disable assert rewrite by "–assert=plain". However then asserts are not as useful. Better solution (PR is coming) is to add PYTEST_DONT_REWRITE in the docstring:

https://docs.pytest.org/en/latest/how-to/assert.html#disabling-assert-rewriting

Reference: https://issues.apache.org/jira/browse/CASSANDRA-18369

Solves the problem that collecting on running test without exclusion on
Pytest test name, importing cassandra driver causes a very strangely
looking error:

from cassandra.cluster import Cluster
cassandra/cluster.py:48: in init cassandra.cluster
???
cassandra/connection.py:40: in init cassandra.connection
???
cassandra/protocol.py:698: in genexpr
???
cassandra/protocol.py:698: in genexpr
???
E KeyError: '@py_builtins'

This happened for Apache Airflow when we wanted to enable pytest
collection to include all files, rather than only "test_*.py" files,
because we found that there were at least few modules in airflow that
did not start with "test".

It turned out that the culprit was Assert Rewrite done by pytest that
tried to rewrite "type_codes.py" module of Cassandra.

One of the ways to avoid the problem is to disable assert rewrite by
"–assert=plain".  However then asserts are not as useful. Better
solution (PR is coming) is to add PYTEST_DONT_REWRITE in the docstring:

https://docs.pytest.org/en/latest/how-to/assert.html#disabling-assert-rewriting

Reference: https://issues.apache.org/jira/browse/CASSANDRA-18369
@potiuk
Copy link
Author

potiuk commented Mar 28, 2023

Not sure why continuous integration failed ( looked like the CI fails with Python 2.7 ?)

FYI. This PR contains auto-patching of cassandra for Airflow CI image as a workaround apache/airflow#30315 - so there is no big hurry with merging and releasing this one. But it would be nice if we don't have to patch cassandra code to make it runs with pytest tests.

@absurdfarce
Copy link
Collaborator

Hey @potiuk, thanks for the PR!

Have you signed the Contributor License Agreement for contributions to DataStax open source projects? If not you can find it at https://cla.datastax.com/. Thanks!

@potiuk
Copy link
Author

potiuk commented Mar 30, 2023

Hey @potiuk, thanks for the PR!

Have you signed the Contributor License Agreement for contributions to DataStax open source projects? If not you can find it at https://cla.datastax.com/. Thanks!

Now yes.

potiuk added a commit to potiuk/airflow that referenced this pull request Mar 30, 2023
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see apache#30311 and apache#30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.
@potiuk
Copy link
Author

potiuk commented Mar 30, 2023

Should I do something more @absurdfarce :) ? Doesn't seem that one is very risky :).

BTW, I also created an issue in Pytest for that pytest-dev/pytest#10844 so maybe they will be able to fix it - but I presume, adding the workaround here will be quite a bit faster and we won't have to add hacks to patch cassandra-driver installed in our image.

potiuk added a commit to apache/airflow that referenced this pull request Mar 31, 2023
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.
@absurdfarce
Copy link
Collaborator

I don't think there's much else to do on the PR @potiuk, I'm just not sure if this is what we want to do.

Please correct me if I'm wrong, but if I understand correctly we're seeing the rewriting in question because pytest is eval'ing all your source and not just the test files. If that's correct then we'd be putting in a workaround in a random source file to address an issue seen by the particulars of your code base. I can imagine other code bases might have similar issues with other parts of the driver code base (based on which parts of the code base they use and/or how they use it). Should I add these tags for them as well?

There's also a concern that this tag appears as a random string at the bottom of a docstring. It's the kind of thing that could easily be deleted by accident in a subsequent PR, especially six months from now when nobody remembers why this random string appears at the bottom of this docstring.

Perhaps most importantly there's an alternative to this approach, again assuming I've understood the problem space correctly. Either a re-organization of the tests in your code base or a modification of the pytest discovery mechanism to pull in the source files you want to use would also address this problem in a way that doesn't involve modifying the Python driver code base. On first blush it seems like that's the more straightforward fix... but I'm happy to discuss.

@potiuk
Copy link
Author

potiuk commented Mar 31, 2023

I don't think there's much else to do on the PR @potiuk, I'm just not sure if this is what we want to do.

Please correct me if I'm wrong, but if I understand correctly we're seeing the rewriting in question because pytest is eval'ing all your source and not just the test files. If that's correct then we'd be putting in a workaround in a random source file to address an issue seen by the particulars of your code base. I can imagine other code bases might have similar issues with other parts of the driver code base (based on which parts of the code base they use and/or how they use it). Should I add these tags for them as well?

I am not sure of that to be honest. I am not 100% sure what actually the bug is, and I hope I am going to find out with pytest-dev/pytest#10844 I hope (and maybe even a fix). It seems that type_codes is the only file that triggers the problem. I believe type_codes.pxd is pretty special (I do not know cpython intrinsic details and the root cause of the problem. For sure it is the only file that triggers it in our imports, but yes - it could be that other files trigger it as well. And I agree - it's likely not cassandra problem but pytest, but my presumption is that it might save hours of debugging for your users in case they stumble upon similar problem. I personally would err for empathy and just merging such change for the sake of "no harm for us, might safe time for my users" but, well, it's just me.

There's also a concern that this tag appears as a random string at the bottom of a docstring. It's the kind of thing that could easily be deleted by accident in a subsequent PR, especially six months from now when nobody remembers why this random string appears at the bottom of this docstring.

Good point, I am happy to add a comment above explaining it - we usually do that (when I created the PR I did not open the issue in pytest yet, but what we usually do is to add a comment simillar to

# This line is needed to workaround pytest issue https://github.com/pytest-dev/pytest/issues/10844
# It can be safely removed when the issue is solved

If that will help with merging it, I am happy to add it.

Perhaps most importantly there's an alternative to this approach, again assuming I've understood the problem space correctly. Either a re-organization of the tests in your code base or a modification of the pytest discovery mechanism to pull in the source files you want to use would also address this problem in a way that doesn't involve modifying the Python driver code base. On first blush it seems like that's the more straightforward fix... but I'm happy to discuss.

Yes, but not entirely. There is a (rather likely - it happened twice for us recently) case where it might slip through the cracks.

We already had our tests organized in the way that we could add a "test_*" restriction. But the root cause was that one (actually two) of our (currently around 2400) contributors, contributed PR with module name not following the convention. And it's slipped through code review. This is the example file from one of our PRs that caused the very problem: https://github.com/apache/airflow/pull/29968/files#diff-adb7a9dd79326f9730b6069804e9334151755a5a615a4a3a8a2d59a5d9d418ee

The problem with adding "test_*" convention to pytest is that if you do not follow the convention, and your files are in modules named differently ("smtp" in our case) you can still as contributor happily run the tests (and they will run and succed despite not following the convention). Pytest should discard such tests when run individually (via IDE or even via pytest your_package.your_module.py), but this convention is only in effect when collecting tests via packages. This happens in CI necessarily - in CI you run pytest your_package - and there - unlike in the individual package case - your tests will be skipped if the module does not follow the convention.

And automation of checking this is not easy. The problem is that if your "tests" package also contains some util code, you cannot easily detect the case. If your package (from our example) is named "smtp" - but this could be easily an "utill" file not containing any tests, so having an "smtp.py" file in there is nothing unusual, we would have to literally parse the file and check if there are any 'Test*' classes and 'test_*' methods in order to be able to flag it as a wrongly named file (basically running pytest collection checks).

In effect, if you do not have more automation enforcing the conventions, Pytest is not warning you that your tests will be silently skipped when you tests the whole projects in CI, while the contributor might happily think they work. Changing the collection rule to all python files fixes this problem - you will no longer have the disconned

BTW. I am not super-strong on merging those changes, we have the workaround and possibly pytest will fix the bug eventually, but I thought it might be a good idea for you to merge it if you want to save some potential troubles for your users. We had a fair share of it in Airflow - but this is a "sunken cost", but maybe by merging single line you can safe a few hours of your users's debugging, who knows ?

@potiuk
Copy link
Author

potiuk commented Mar 31, 2023

BTW. Regardless if you will decide to merge - I added a fixup to replace the link of CASSANDRA jira as fixup with :

# This line is needed to workaround pytest issue https://github.com/pytest-dev/pytest/issues/10844
# It can be safely removed when the issue is solved

Which I think is a better and more complete description (and points to pytest being the "error" source").

@absurdfarce
Copy link
Collaborator

@potiuk Apologies for the delay in getting back to you on this. I've been thinking about it a bit, specifically the following:

BTW. I am not super-strong on merging those changes, we have the workaround and possibly pytest will fix the bug eventually, but I thought it might be a good idea for you to merge it if you want to save some potential troubles for your users.

I think this idea is exactly right; the idea of having something discoverable which can save other users from the potential time sink of having to address this issue doesn't seem unreasonable at all. But the more I thought about it the more I kept coming back to the idea that this PR itself seems to serve that goal exactly. It's easily discoverable; if you search for "pytest" within this project on Github you come across it readily. And because of the work you've already put in (and documented so thoroughly in the comments above) it seems to me that this issue is at least as relevant as a link to a pytest issue in source, if not more so.

I'm still not convinced about the idea of merging this PR directly (for the reasons outlined above) but it seems to me that the context here is what really matters for this specific issue. So if it works for you I suppose I'd like to just close this PR out and know that we've documented the issue better than we ever possibly could with any comments in the driver source.

Oh, and I guess if anybody asks in the future I can just tell them "go look at what Airflow did" :)

Thanks again for all the work you put in on this, not only diagnosing the problem in the first place but also for the extensive (and very clear) write-up in comments above. If those comments did not describe the issue as clearly and completely as they do I'd feel less settled about my proposed approach... but they do, and so I do!

@absurdfarce absurdfarce closed this May 9, 2023
@potiuk
Copy link
Author

potiuk commented May 9, 2023

I'm still not convinced about the idea of merging this PR directly (for the reasons outlined above) but it seems to me that the context here is what really matters for this specific issue. So if it works for you I suppose I'd like to just close this PR out and know that we've documented the issue better than we ever possibly could with any comments in the driver source.
Oh, and I guess if anybody asks in the future I can just tell them "go look at what Airflow did" :)

Yeah. Up to you. Sorry I missed the response, but yeah, I agree it could be enough. I hope pytest will solve it eventually, though the seem to ignore it largely even if I provided a minimum reproducible case as they asked. This is eventually Pytest problem not yours.

ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 12, 2023
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.

GitOrigin-RevId: 34d11542a01cf3a843a75a738294a444505f7fba
@potiuk
Copy link
Author

potiuk commented Jul 15, 2023

@absurdfarce : the discussion on the original problem in Pytest pytest-dev/pytest#10844 (comment) continued and @bluetech suggested a solution that you might use. I have 0 context to be able to asses if it makes sense or not, but maybe you should consider doing that:

Their problematic code is

dict((v, globals()[k]) for k, v in type_codes.dict.items() if not k.startswith('_'))
They might be more receptive to changes in this line if they are not obviously pytest specific. For example

dict((v, globals()[k]) for k, v in type_codes.dict.items() if k.endswith('Type'))
(should also allow them to remove CUSTOM_TYPE = object() line from the protocol.py file)

or

dict((v, globals()[k]) for k, v in type_codes.dict.items() if not k.startswith(('_', '@'))
(kinda pytest specific but hopefully OK).

Again - no idea if that's good solution or why, but seems that it's a way to avoid the problem while slightly refactoring the code in question.

potiuk added a commit to potiuk/airflow that referenced this pull request Jul 15, 2023
The cassandra hack added in apache#30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.
potiuk added a commit to apache/airflow that referenced this pull request Jul 15, 2023
…32626)

The cassandra hack added in #30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 7, 2023
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.

GitOrigin-RevId: 34d11542a01cf3a843a75a738294a444505f7fba
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 15, 2024
…32626)

The cassandra hack added in #30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.

GitOrigin-RevId: c6594480e2722513fd082a6c65e30e2504698ba2
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 19, 2024
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.

GitOrigin-RevId: 34d11542a01cf3a843a75a738294a444505f7fba
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 19, 2024
…32626)

The cassandra hack added in #30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.

GitOrigin-RevId: c6594480e2722513fd082a6c65e30e2504698ba2
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 8, 2024
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.

GitOrigin-RevId: 34d11542a01cf3a843a75a738294a444505f7fba
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 8, 2024
…32626)

The cassandra hack added in #30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.

GitOrigin-RevId: c6594480e2722513fd082a6c65e30e2504698ba2
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 4, 2025
Currently we had a requirement to use `test_` prefix for modules for all
our tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the `test_` prefix. This happened already: see #30311 and #30306

The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.

The move is accompanied by converting pytest configuration to
pyproject.toml.

After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.

See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.

Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007

Yet another assert-rewrite problem was that ``test_airflow_context``
also failed with ``--assert=rewrite`` turned on when python_files
were not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)

Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).

The following changes are applied:

* conversion of pytest configuration from pytest.ini to
  pyproject.toml (for main project, docker_tests and breeze)

* addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
  to reflect they need different configuration for pytest (lack
  of pytest-asyncio for example)

* adding automated patching of Cassandra type_codes.py with
  PYTEST_DONT_REWRITE docstring marker

* add pyproject.toml to docker context in order to include it in
  sources of airflow in the image

* changed working directory for k8s suite of breeze commands to be
  kubernetes_tests

* remove pytest.ini from docker compose mounting and automated removal
  in entrypoin in case old image is used

* rename few breeze function to not start with "test_" so that they
  are not collected as test methods

* remove few test dags from collecton by pytest by name

* replace run_module with run_path usage in EKS test and test_www.

* CI workflow is updated to use docker_tests as working directory
  for those tests

* perf dags were moved out of the tests/test_utils dir to dev
  directory.

* PlainAsserts test type was added and the ``test_airflow_context``
  test is run in this separate test type when running parallel
  tests by default.

GitOrigin-RevId: 34d11542a01cf3a843a75a738294a444505f7fba
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 4, 2025
…32626)

The cassandra hack added in #30315 does not seem to have a chance to get
away. Neither Pytest pytest-dev/pytest#10844
nor Datastax datastax/python-driver#1142
want to own the problem for now (though there is a proposal from
pytest contributors on how Datastax could refactor their code to
avoid the problem)

However during the discussion an idea popped in my head on how
we could come back to test_* pattern with far less probability of
missing some tests that are added to wrong files. Seems that we
can add a fixture that will outright fail tests if they are
placed if files not following the test_* pattern. While it would
not help in case test would be wrongly named in the first place,
it would definitely help to not to add more tests in wrongly named
files because it will be literally impossible to run the tests
added in a wrong file, even if you manualy do `pytest somefile.py`
and avoid running collection.

I also did a quick check to try to find cases where the test_*
file name was already violated and I found (and renamed) two that
I have found. It seems it is quite likely that similar mistake
could be done in the future - but with the fixture I added it
should be far less likely someone adds tests in a wrongly named
file.

GitOrigin-RevId: c6594480e2722513fd082a6c65e30e2504698ba2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants