Skip to content

Conversation

@scemama
Copy link
Member

@scemama scemama commented Nov 10, 2024

  • The text backend used 10 characters for storing 64-bit integers in text format, but this was too small. I changed it to 20 characters. Warning: This breaks backward-compatibility with the text backend!
  • When you pass a list of determinants to the write function, it now checks that the number of alpha and beta occupied orbitals in the determinant is consistent with nup and ndn, and it also checks that the MO indices are in the correct range [0:mo_num[
  • I had to update the Fortran test

@scemama scemama requested a review from q-posev November 10, 2024 18:23
@q-posev
Copy link
Member

q-posev commented Nov 10, 2024

Thanks @scemama !

Fortran tests fail in the CI: Segmentation fault. Some illegal memory access?

@q-posev
Copy link
Member

q-posev commented Nov 10, 2024

And one general concern: the new determinant check requires that number of up and down electrons to be present in the file. If they are missing - the new error code can be added to TREXIO and returned from trexio_write_determinant_list in this case to tell the user that they forgot to write up/down electrons. This should also be stated explicitly in the documentation and we need to inform @NastaMauger to modify the PySCF CI dumper accordingly.

@NastaMauger
Copy link

@q-posev I am confused regarding your comment.
I naively used the bitfield format to register the determinants in this PR. I was thinking it might a better suit since the determinant list can be huge

@q-posev
Copy link
Member

q-posev commented Nov 10, 2024

@NastaMauger what you did was right. In this PR @scemama introduced an additional correctness check for the CI determinants (see this line). This correctness check requires that the number of up and down electrons are written in the TREXIO file before you start writing the determinants (otherwise this check cannot be performed). What you implemented in pyscf-forge is correct with the current version of TREXIO but will break once this PR is merged. But if in your PySCF CASCI producer you also perform trexio_write_electron_up_num and trexio_write_electron_dn_num - then your CI producer will remain compatible with both pre-PR and post-PR version of TREXIO.

@scemama
Copy link
Member Author

scemama commented Nov 11, 2024

And one general concern: the new determinant check requires that number of up and down electrons to be present in the file. If they are missing - the new error code can be added to TREXIO and returned from trexio_write_determinant_list in this case to tell the user that they forgot to write up/down electrons. This should also be stated explicitly in the documentation and we need to inform @NastaMauger to modify the PySCF CI dumper accordingly.

Now the function returns TREXIO_INVALID_ELECTRON_NUM if the number of electrons is wrong in a determinant, and TREXIO_INVALID_MO_INDEX if the MO index is out of bounds.
I added it to the documentation.

@scemama
Copy link
Member Author

scemama commented Nov 11, 2024

Thanks @scemama !

Fortran tests fail in the CI: Segmentation fault. Some illegal memory access?

I don't understand what is wrong. I can't reproduce the problem on my computer, and valgrind doesn't detect anything.
In the artifact file, the segfault happens at line 24 of test_f.f90, which seems to be just a print statement. I really don't see anything wrong.. :-(

@q-posev
Copy link
Member

q-posev commented Nov 11, 2024

@scemama it must be some non-standard Fortran stuff or the way you iterate when printing. Can you please

  • Install the same gfortran as in the CI (e.g. gfortran-12 which we use macos)
  • make a clean clone of the repo, checkout to your branch, compile and test via make check? Maybe you have smth locally which allows the Fortran test to pass

@q-posev
Copy link
Member

q-posev commented Nov 11, 2024

@scemama I confirm that following approach 2 from above (clean repo run) and gfortran-11, I can reproduce this error in the Fortran test on your branch on Ubuntu. The master branch is OK so clearly the bug is present only on this branch.

It fails right after the call to trexio_cp function in the Fortran test, so maybe there is a memory leak somewhere on this branch when this function is executed? Or perhaps a memory leak happened during the determinant writing and manifested itself only later?

@q-posev
Copy link
Member

q-posev commented Nov 11, 2024

@scemama I found a bug in the C code (unclosed comment) but the Fortran test bug is still there. It must be a memory leak somewhere.

@scemama
Copy link
Member Author

scemama commented Nov 12, 2024

  • ./configure FC=ifort FCFLAGS="-C -traceback" => OK.
  • ./configure FCFLAGS="-g -fcheck=all -Waliasing -Wampersand -Wconversion -Wsurprising -Wintrinsics-std -Wno-tabs -Wintrinsic-shadow -Wline-truncation -Wreal-q-constant -Wuninitialized -fbacktrace -ffpe-trap=zero,overflow -finit-real=nan" => OK
  • ./configure FCFLAGS="-g -fcheck=all -Waliasing -Wampersand -Wconversion -Wsurprising -Wintrinsics-std -Wno-tabs -Wintrinsic-shadow -Wline-truncation -Wreal-q-constant -Wuninitialized -fbacktrace -ffpe-trap=zero,overflow -finit-real=nan" => OK
  • ./configure FCFLAGS="" => OK
  • ./configure FCFLAGS="-O2" => ERROR!
  • ./configure FC=ifort FCFLAGS="-O2" => OK

I suspect that there is a bug in how gfortran11 handles strings internally with -O2. Valgrind reports a NULL pointer in error in libgfortran.so.5.0.0

I removed the -O2 flag from configure.ac for the fortran compiler.

@scemama
Copy link
Member Author

scemama commented Nov 12, 2024

OK, now this modification broke the python test because the determinant data in the test is dummy, and now we are checking that the numbers of up and dn electrons in the determinants are OK. This means that the modification works well! But we need to change the python tests :-(

@q-posev
Copy link
Member

q-posev commented Nov 12, 2024

./configure FCFLAGS="-O2" => ERROR!

We have always used -O2 as our compiler flag by default and the code worked fine, there were no memory leaks! I am pretty sure that there is something about changes in this branch that break the Fortran test and not the compiler.

OK, now this modification broke the python test because the determinant data in the test is dummy, and now we are checking that the numbers of up and dn electrons in the determinants are OK. This means that the modification works well! But we need to change the python tests :-(

Good news then! Can you translate the commands from io_determinant.c into the Python test?

@scemama
Copy link
Member Author

scemama commented Nov 12, 2024

We have always used -O2 as our compiler flag by default and the code worked fine, there were no memory leaks! I am pretty sure that there is something about changes in this branch that break the Fortran test and not the compiler.

You are right. Address-sanitizer helped me find it. I fixed it, and it now works with -O2. Sorry.

Good news then! Can you translate the commands from io_determinant.c into the Python test?

I won't have time in the two coming weeks.

@q-posev
Copy link
Member

q-posev commented Nov 12, 2024

Thanks a lot @scemama !
I will see if I can find time to translate the tests. I have to make release of trexio-tools ASAP following recent additions and changes.

Looks like we are converging to the complete PySCF-TREXIO integration! 😊

@scemama scemama mentioned this pull request Nov 12, 2024
@q-posev
Copy link
Member

q-posev commented Jan 2, 2025

@scemama I fixed the Python tests. Merging this PR as completed now.

@q-posev q-posev merged commit ccb9bf8 into master Jan 2, 2025
4 checks passed
@scemama
Copy link
Member Author

scemama commented Jan 2, 2025

@scemama I fixed the Python tests. Merging this PR as completed now.

Great! Thanks!

@q-posev q-posev deleted the det-checks branch January 9, 2025 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants