Skip to content

Detect clearsky modifications #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions pvlib/clearsky.py
Original file line number Diff line number Diff line change
Expand Up @@ -687,7 +687,7 @@ def detect_clearsky(measured, clearsky, times, window_length,
raise NotImplementedError('algorithm does not yet support unequal ' \
'times. consider resampling your data.')

samples_per_window = int(window_length / sample_interval)
samples_per_window = int(window_length / sample_interval) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change please. samples_per_window is counting intervals, not endpoints. Because it's an intermediate it could be renamed to be more clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that not adding 1 gives the incorrect number of samples per window. For example, if my data is 30-minute frequency (sample_interval=30) and I want 60 minute windows (window_length=60), the current implementation would only give 2 points per window (when the Hankel matrix is constructed in the following lines). In this case, 2 points per window would only span a 30 minute window, not the intended 60.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example makes my point. The algorithm operates on intervals not on points in time. The value at a timestamp is considered as the value for the following (?) interval - I'd have to look carefully at the Hankel matrix and the diff to see if we adopted a left- or right- endpoint convention.


# generate matrix of integers for creating windows with indexing
from scipy.linalg import hankel
Expand All @@ -697,25 +697,27 @@ def detect_clearsky(measured, clearsky, times, window_length,
# calculate measurement statistics
meas_mean = np.mean(measured[H], axis=0)
meas_max = np.max(measured[H], axis=0)
meas_slope = np.diff(measured[H], n=1, axis=0)
meas_ghi_diff = np.diff(measured[H], n=1, axis=0)
meas_slope = np.diff(measured[H], n=1, axis=0) / sample_interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-uniform time steps could be handled here by

  • converting the time to UNIX timestamps
  • diffing the time to get time intervals
  • element-wise division in the calculation of meas_slope
  • using the diffed time in the calculation of meas_line_length , 'clear_slope`, etc.

This could be the subject of a subsequent pull request. In hindsight, I could have separated #507 into two issues (different from 1 minute, and non-uniform time steps.)

# matlab std function normalizes by N-1, so set ddof=1 here
meas_slope_nstd = np.std(meas_slope, axis=0, ddof=1) / meas_mean
meas_slope_max = np.max(np.abs(meas_slope), axis=0)
# meas_slope_max = np.max(np.abs(meas_slope), axis=0)
meas_line_length = np.sum(np.sqrt(
meas_slope*meas_slope + sample_interval*sample_interval), axis=0)
meas_ghi_diff*meas_ghi_diff + sample_interval*sample_interval), axis=0)

# calculate clear sky statistics
clear_mean = np.mean(clearsky[H], axis=0)
clear_max = np.max(clearsky[H], axis=0)
clear_slope = np.diff(clearsky[H], n=1, axis=0)
clear_slope_max = np.max(np.abs(clear_slope), axis=0)
clear_ghi_diff = np.diff(clearsky[H], n=1, axis=0)
clear_slope = np.diff(clearsky[H], n=1, axis=0) / sample_interval
# clear_slope_max = np.max(np.abs(clear_slope), axis=0)

from scipy.optimize import minimize_scalar

alpha = 1
for iteration in range(max_iterations):
clear_line_length = np.sum(np.sqrt(
alpha*alpha*clear_slope*clear_slope +
alpha*alpha*clear_ghi_diff*clear_ghi_diff +
sample_interval*sample_interval), axis=0)

line_diff = meas_line_length - clear_line_length
Expand All @@ -725,7 +727,7 @@ def detect_clearsky(measured, clearsky, times, window_length,
c2 = np.abs(meas_max - alpha*clear_max) < max_diff
c3 = (line_diff > lower_line_length) & (line_diff < upper_line_length)
c4 = meas_slope_nstd < var_diff
c5 = (meas_slope_max - alpha*clear_slope_max) < slope_dev
c5 = np.max(np.abs(meas_slope - alpha*clear_slope), axis=0) < slope_dev
c6 = (clear_mean != 0) & ~np.isnan(clear_mean)
clear_windows = c1 & c2 & c3 & c4 & c5 & c6

Expand Down Expand Up @@ -761,6 +763,13 @@ def rmse(alpha):
components['slope_max'] = c5
components['mean_nan'] = c6
components['windows'] = clear_windows

components['mean_diff_array'] = np.abs(meas_mean - alpha*clear_mean)
components['max_diff_array'] = np.abs(meas_max - alpha*clear_max)
components['line_length_array'] = meas_line_length - clear_line_length
components['slope_nstd_array'] = meas_slope_nstd
components['slope_max_array'] = (np.max(meas_slope - alpha*clear_slope, axis=0))

return clear_samples, components, alpha
else:
return clear_samples
Expand Down