-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support for unequal time intervals #1678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just wondering, what would be the pros and cons of this solution vs. resampling the data to equal time intervals? |
Hi @adriesse, I think that one of the pros of this solution over resampling is that it provides support for the use case where there are multiple discrete periods over which data is collected. For example, if instruments only collect data between certain hours of each day. Or if there is a break between testing periods. Additionally, resampling would sacrifice some level of resolution, which might be of concern for some use cases. |
Resampling and interpolation can prepare data that are almost (but not exactly) on equal times steps, or that have short gaps. I think the unaddressed use case is a GHI Series that has long gaps, e.g., several hours. |
Resampling doesn't require interpolation. I customarily leave larger data gaps full of nans, which has the minor side benefit of leaving gaps on time series line charts. Would the existing clear_sky functions fail on such a time series? |
The existing code may propagate nans and return a lot of incorrect not-clear labels. @eccoope do you know? |
Perhaps the first step should be to define what kind of behavior is wanted in different situations, as this is not necessarily obvious: explicit or implicit assumptions could be optimistic or pessimistic with regard to clearness. Perhaps an argument like Come to think of it, could this somehow be rewritten to work as: |
We tried |
@cwhanse, yes, the existing code propagates nans in the stage where statistics for each interval are calculated, but then the nans are (incorrectly) typecast into boolean values when the criteria is evaluated |
I still think it would be valuable to enumerate the different situations that can arise, and for each, explain what the function should and will do. |
Although instrument failure often creates temporal gaps in real-world irradiance data, pvlib.clearsky.detect_clearsky( ) does not support unequal time intervals. I propose adding support for unequal time intervals by masking intervals with temporal gaps when identifying clearsky periods. This can be done by removing columns corresponding to periods with gaps from the Hankel matrix, while maintaining the original index of the DataFrame by returning NaNs for the periods with gaps. This would require the modification of the supporting methods in pvlib.clearsky: _calc_stats( ), _slope_nstd_windowed( ), max_diff_windowed( ), _line_length_windowed( ), _to_centered_series( ), and _clear_sample_index( ). The method _get_sample_intervals( ) in pvlib.tools would also need to be modified.
The text was updated successfully, but these errors were encountered: