You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Creating an empty DataFrame (no data, no index) and then filling it key by key can cause performance issues in some situations. I believe the issue is due to the way that pandas computes joins on the index. This happens in two places in pvlib: Location.get_airmass and ModelChain.prepare_inputs. (determined with grep -r 'pd.DataFrame()' pvlib).
Location.get_airmass: this is most relevant with shorter input lengths, especially if solar_position is not supplied. I discovered this bottleneck when profiling a loop that called ModelChain.run_model on daily weather data.
ModelChain.prepare_inputs: this only an issue there if the user does not supply any weather data, in which case clear sky calculations will be run and the results assigned to the empty DataFrame. Less likely that anyone is running into a significant performance issue here due to the additional calculations, including a linke turbidity lookup.
Here's the key part of Location.get_airmass using an input of 1440 times, followed by two alternative implementations:
wholmgren
changed the title
slow performance when creating empty DataFrame in Location.get_airmass
slow performance when creating empty DataFrame in Location.get_airmass and ModelChain.prepare_inputs
Jul 8, 2018
wholmgren
changed the title
slow performance when creating empty DataFrame in Location.get_airmass and ModelChain.prepare_inputs
slow performance when assigning to empty DataFrame in Location.get_airmass and ModelChain.prepare_inputs
Jul 8, 2018
Solution #1 uses an explicit index, while solution #2 doesn't. Why? Also, solution #2 selects columns as a reflection in python airmass = airmass[['airmass_relative', 'airmass_absolute']]. Is that mandatory?
I would recommend solution #2. I always use dicts as an intermediate step in DataFrame creation and they work reliably.
Solution 1 uses an explicit index, while solution 2 doesn't. Why?
The explicit index in solution 1 speeds up the index join for the existing DataFrame due to something about pandas internal workings. No explicit index is necessary for solution 2 because pandas already needs to compute the join between the dict values.
Also, solution 2 selects columns as a reflection in python airmass = airmass[['airmass_relative', 'airmass_absolute']]. Is that mandatory?
The assignment pattern in the existing code guarantees order. This line in solution 2 is necessary to guarantee that the column order is the same on all python/pandas version combinations. I hesitate to say it's mandatory, but we try to keep order consistent.
Creating an empty DataFrame (no data, no index) and then filling it key by key can cause performance issues in some situations. I believe the issue is due to the way that pandas computes joins on the index. This happens in two places in pvlib:
Location.get_airmass
andModelChain.prepare_inputs
. (determined withgrep -r 'pd.DataFrame()' pvlib
).Location.get_airmass
: this is most relevant with shorter input lengths, especially ifsolar_position
is not supplied. I discovered this bottleneck when profiling a loop that calledModelChain.run_model
on daily weather data.ModelChain.prepare_inputs
: this only an issue there if the user does not supply any weather data, in which case clear sky calculations will be run and the results assigned to the empty DataFrame. Less likely that anyone is running into a significant performance issue here due to the additional calculations, including a linke turbidity lookup.Here's the key part of
Location.get_airmass
using an input of 1440 times, followed by two alternative implementations:Alternative 1:
Alternative 2:
Either 1 or 2 could work for
Location.get_airmass
. Only 1 would easily work forModelChain.run_model
.Versions:
pvlib.__version__
: '0.5.2+16.g58f95e0'pandas.__version__
: '0.23.1'Approximately reproduced on python 3.5 and pandas 0.17.
line profiler:
The text was updated successfully, but these errors were encountered: