Replies: 2 comments 16 replies
-
|
Hi @guuuj . Is there a way for you to do this in a live session on the HPC center and monitor CPU activity while QE runs? My suspicion is that somehow QE only gets one hardware CPU core and the more ranks you spawn the slower this will get if you are limited to one hardware cpu core. Another way to check this is to try to run QE in solid_dmft only on 1 core and check if this is faster than 4. Can you try this? If this is the case you need to somehow make sure that you are allowed to spawn a second MPI process on your HPC system and if needed at special flags for oversubscription. But let's first figure out if this is the problem here. Best, |
Beta Was this translation helpful? Give feedback.
-
|
I have encountered an issue similar to the one discussed above. My Quantum ESPRESSO (QE) is compiled with IntelMKL + IntelMPI, and I am running it within a conda environment that includes TRIQS (installed via conda) and solid_dmft (installed via conda). When I run the QE calculation inside solid_dmft (Ce₂O₃ example), using
However, when I open the corresponding QE output file, there is no explicit error message — it simply stops before any SCF iteration begins. I also tried to use mpirun from the conda environment before sourcing the Intel MPI environment, but in that case, the program does not occupy any CPU cores. Another issue is that when I set
Could anyone advise what the root cause might be, or how to properly configure the MPI environment so that QE runs correctly inside solid_dmft? |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Dear developers,


Recently, I use solid_dmft to do the csc-dmft calculations (the ce2o3 case in the tutorial) in our HPC cluster. I use QE as the DFT code. And I found the QE scf part is very slow.
When I use 4 cores, the QE scf time is 1m27s cpu, 6m7s WALL; when I change to 12 cores, it is even worse, the QE scf time is 50s for cpu time, but 10m23s for WALL time. In the latter case, most of the wall time comes from the "electrons" part (515.29s), and in the "electrons" part, most of the wall time comes from the "sum_band" (269.56s). See below
It seems that it is the mpi efficiency that is very low. So I checked the code of qe_manager.py. I find that the QE is called by "qe_exec += f'pw.x -nk {number_cores}'", which means it is a k points parallel. And I found that the QE seems to use the diagonalization parallel by default if I call QE by slurm directly. So I modify the code to "qe_exec += f'pw.x -nd {number_cores}'" and do the calculation by solid_dmft. However, the situation is even worse. When I use 9 cores to do the calculation, see below, the QE scf time is about 2h cpu time+ 6h wall time.
This is very strange. For comparision, I also do a QE scf calculation by using slurm directly (that is, a normal QE scf calculation not by solid_dmft), in which also 9 cores are used. The QE scf time is now 26s cpu time+28s wall time, which is very quick.
So, my question is, why the QE scf calcuation will be so slow when it is called by solid_dmft?
Beta Was this translation helpful? Give feedback.
All reactions