Skip to content

Support for unequal time intervals #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eccoope opened this issue Feb 27, 2023 · 9 comments · May be fixed by #1708
Open

Support for unequal time intervals #1678

eccoope opened this issue Feb 27, 2023 · 9 comments · May be fixed by #1708

Comments

@eccoope
Copy link

eccoope commented Feb 27, 2023

Although instrument failure often creates temporal gaps in real-world irradiance data, pvlib.clearsky.detect_clearsky( ) does not support unequal time intervals. I propose adding support for unequal time intervals by masking intervals with temporal gaps when identifying clearsky periods. This can be done by removing columns corresponding to periods with gaps from the Hankel matrix, while maintaining the original index of the DataFrame by returning NaNs for the periods with gaps. This would require the modification of the supporting methods in pvlib.clearsky: _calc_stats( ), _slope_nstd_windowed( ), max_diff_windowed( ), _line_length_windowed( ), _to_centered_series( ), and _clear_sample_index( ). The method _get_sample_intervals( ) in pvlib.tools would also need to be modified.

@eccoope eccoope linked a pull request Mar 24, 2023 that will close this issue
8 tasks
@adriesse
Copy link
Member

Just wondering, what would be the pros and cons of this solution vs. resampling the data to equal time intervals?

@eccoope
Copy link
Author

eccoope commented Mar 30, 2023

Hi @adriesse, I think that one of the pros of this solution over resampling is that it provides support for the use case where there are multiple discrete periods over which data is collected. For example, if instruments only collect data between certain hours of each day. Or if there is a break between testing periods. Additionally, resampling would sacrifice some level of resolution, which might be of concern for some use cases.

@cwhanse
Copy link
Member

cwhanse commented Mar 30, 2023

Resampling and interpolation can prepare data that are almost (but not exactly) on equal times steps, or that have short gaps. I think the unaddressed use case is a GHI Series that has long gaps, e.g., several hours.

@adriesse
Copy link
Member

Resampling doesn't require interpolation. I customarily leave larger data gaps full of nans, which has the minor side benefit of leaving gaps on time series line charts. Would the existing clear_sky functions fail on such a time series?

@cwhanse
Copy link
Member

cwhanse commented Mar 30, 2023

The existing code may propagate nans and return a lot of incorrect not-clear labels. @eccoope do you know?

@adriesse
Copy link
Member

Perhaps the first step should be to define what kind of behavior is wanted in different situations, as this is not necessarily obvious: explicit or implicit assumptions could be optimistic or pessimistic with regard to clearness. Perhaps an argument like min_periods that is found in pandas rolling() could be helpful.

Come to think of it, could this somehow be rewritten to work as: df.rolling(10, min_periods=5).detect_clearsky()?

@cwhanse
Copy link
Member

cwhanse commented Mar 31, 2023

We tried rolling in the work leading to #1074. It could work but would be very slow. If the method isn't built in, rolling defaults to looping through the index.

@eccoope
Copy link
Author

eccoope commented Apr 3, 2023

@cwhanse, yes, the existing code propagates nans in the stage where statistics for each interval are calculated, but then the nans are (incorrectly) typecast into boolean values when the criteria is evaluated

@adriesse
Copy link
Member

I still think it would be valuable to enumerate the different situations that can arise, and for each, explain what the function should and will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants