Skip to content

Refactor detect_clearsky #1074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Jan 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
5c94c64
refactor detect_clearsky
cwhanse Oct 5, 2020
af9a314
formatting
cwhanse Oct 5, 2020
ea5021f
make array input work
cwhanse Oct 5, 2020
2a5cdeb
lint, make clearsky a Series
cwhanse Oct 5, 2020
6f39e6b
make work for py36min
cwhanse Oct 5, 2020
153c7fd
update arguments in call
cwhanse Oct 5, 2020
8e719ae
fit another diff
cwhanse Oct 5, 2020
d680f77
fix ddof, add test for _calc_stats with window=3
cwhanse Oct 6, 2020
c5a15b9
actually add test for _calc_stats with window=3
cwhanse Oct 6, 2020
c16eed5
add test__calc_c5, simplify _calc_c5
cwhanse Oct 6, 2020
a14eb2c
really add test__calc_c5 this time
cwhanse Oct 6, 2020
84cb165
fix _calc_c5, use shift
cwhanse Oct 8, 2020
e4cf350
fix test__calc_stats
cwhanse Oct 8, 2020
f1ae2d4
remove align from public function kwargs
cwhanse Oct 8, 2020
1e390c9
fiddly window and shifting
cwhanse Oct 8, 2020
0eff949
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Oct 12, 2020
d39218a
fix _calc_c5
cwhanse Oct 12, 2020
86c4e79
can now label points at the end of the time series
cwhanse Oct 12, 2020
ace81cf
fix formatting
cwhanse Oct 12, 2020
ffc7fac
only labels end of data if window is short enough
cwhanse Oct 12, 2020
25c9c8b
improve coverage
cwhanse Oct 12, 2020
62a3775
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Oct 15, 2020
ce4abbb
add to whatsnew
cwhanse Oct 16, 2020
a071471
edits from partial review
cwhanse Oct 28, 2020
4e4eeb5
remove unused function
cwhanse Oct 29, 2020
6ed9838
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Nov 11, 2020
bf061dc
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Dec 4, 2020
6e6af32
put test for equal time steps back
cwhanse Dec 4, 2020
a592d25
make times optional
cwhanse Dec 4, 2020
f2c5e8e
make times optional
cwhanse Dec 4, 2020
352212b
undo redundant raise, use total_seconds()
cwhanse Dec 4, 2020
74e1e54
formatting
cwhanse Dec 7, 2020
055a7c3
go back to seconds
cwhanse Dec 7, 2020
aa3b2ac
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Dec 15, 2020
f7dadbb
remove array inputs
cwhanse Dec 15, 2020
282f374
Revert "remove array inputs"
cwhanse Dec 16, 2020
5e9e186
Merge branch 'master' of https://github.com/pvlib/pvlib-python into d…
cwhanse Dec 22, 2020
56950de
changes from review: drop tail, edit whatsnew, add comments to detect…
cwhanse Dec 22, 2020
5425ad3
revert change to detect_clearsky output
cwhanse Dec 22, 2020
098edeb
update function name
cwhanse Dec 22, 2020
47ccf0e
factor line length into helper for performance
cwhanse Dec 23, 2020
5927d73
fix test__calc_line_length_windowed
cwhanse Dec 23, 2020
55453d7
add benchmarks/detect_clearsky.py
wholmgren Dec 23, 2020
1d16a31
fix mistakes
wholmgren Dec 23, 2020
77d899d
Merge pull request #6 from wholmgren/1074asv
cwhanse Dec 23, 2020
f5ffda3
work around pd.rolling.apply, drop align options
cwhanse Dec 23, 2020
f0d9b39
Merge branch 'detect_clearsky' of https://github.com/cwhanse/pvlib-py…
cwhanse Dec 23, 2020
8702a1c
remove mistaken .values
cwhanse Dec 23, 2020
608c1d4
add test for new _windowed_stat function
cwhanse Dec 23, 2020
9387067
patch up test__calc_stats
cwhanse Dec 23, 2020
c3c1f9e
add mode kwarg for np.pad
cwhanse Dec 23, 2020
e041171
replace windowed stats calculations
cwhanse Dec 24, 2020
7c7b4b2
add tests for new windowed functions
cwhanse Dec 24, 2020
20130d8
uncomplicate tests, remove align options
cwhanse Dec 24, 2020
9d64abe
move common data to fixture
cwhanse Dec 24, 2020
0eda545
index correctly
cwhanse Dec 24, 2020
4eb70e1
correct cut/paste/delete errors
cwhanse Dec 24, 2020
983bd6f
proofread it next time
cwhanse Dec 24, 2020
bdee2ce
replace roller with numpy
cwhanse Dec 28, 2020
26e36a2
try this dimensioning
cwhanse Dec 28, 2020
05d8bc1
try numpy arrays instead
cwhanse Dec 28, 2020
5b4e441
remove unneeded conversion to Series
cwhanse Dec 28, 2020
e53572d
Revert "remove unneeded conversion to Series"
cwhanse Dec 28, 2020
ea82348
param ndays in benchmark
wholmgren Dec 28, 2020
970227d
Line formatting
cwhanse Dec 29, 2020
79a329c
More line formatting
cwhanse Dec 29, 2020
7af5f55
Remove unneeded declaration
cwhanse Dec 29, 2020
be1829b
fix test error, .all -> .any
cwhanse Jan 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions benchmarks/benchmarks/detect_clearsky.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""
ASV benchmarks for detect clear sky function.
"""

import pandas as pd
from pvlib import clearsky, solarposition
import numpy as np


class DetectClear:
params = [1, 10, 100] # number of days
param_names = ['ndays']

def setup(self, ndays):
self.times = pd.date_range(start='20180601', freq='1min',
periods=1440*ndays)
self.lat = 35.1
self.lon = -106.6
self.solar_position = solarposition.get_solarposition(
self.times, self.lat, self.lon)
clearsky_df = clearsky.simplified_solis(
self.solar_position['apparent_elevation'])
self.clearsky = clearsky_df['ghi']
measured_dni = clearsky_df['dni'].where(
(self.times.hour % 2).astype(bool), 0)
cos_zen = np.cos(np.deg2rad(self.solar_position['apparent_zenith']))
self.measured = measured_dni * cos_zen + clearsky_df['dhi']
self.measured *= 0.98
self.window_length = 10

def time_detect_clearsky(self, ndays):
clearsky.detect_clearsky(
self.measured, self.clearsky, self.times, self.window_length
)
10 changes: 8 additions & 2 deletions docs/sphinx/source/whatsnew/v0.8.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,23 @@ v0.8.1 (MONTH DAY YEAR)
Breaking changes
~~~~~~~~~~~~~~~~


Deprecations
~~~~~~~~~~~~
* ``pvlib.irradiance.liujordan`` is deprecated.
* ``pvlib.irradiance.liujordan`` is deprecated. :py:func:`pvlib.irradiance.campbell_norman`
replaces ``pvlib.irradiance.liujordan``.

Enhancements
~~~~~~~~~~~~
* Create :py:func:`~pvlib.pvsystem.PVSystem.fuentes_celltemp` and add ``temperature_model='fuentes'``
option to :py:class:`~pvlib.modelchain.ModelChain`. (:pull:`1042`) (:issue:`1073`)
* Added :py:func:`pvlib.temperature.ross` for cell temperature modeling using
only NOCT. (:pull:`1045`)
* :py:func:`pvlib.clearsky.detect_clearsky` now uses centered rolling windows
instead of left-aligned rolling windows. (:pull:`1074`)
* The 'times' and 'window_length` parameters are now optional kwargs in
:py:func:`pvlib.clearsky.detect_clearsky`. If omitted, 'times' is set
equal to the index of parameter 'measured', and 'window_length' is set to
10 minutes. (:pull:`1074`)
* Added :py:func:`pvlib.inverter.sandia_multi` and :py:func:`pvlib.inverter.pvwatts_multi`
for modeling inverters with multiple MPPTs (:issue:`457`, :pull:`1085`, :pull:`1106`)
* Added optional ``attributes`` parameter to :py:func:`pvlib.iotools.get_psm3`
Expand Down
237 changes: 180 additions & 57 deletions pvlib/clearsky.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

import numpy as np
import pandas as pd
from scipy.optimize import minimize_scalar
from scipy.linalg import hankel

from pvlib import atmosphere, tools

Expand Down Expand Up @@ -597,7 +599,118 @@ def _calc_d(aod700, p):
return d


def detect_clearsky(measured, clearsky, times, window_length,
def _calc_stats(data, samples_per_window, sample_interval, H):
""" Calculates statistics for each window, used by Reno-style clear
sky detection functions. Does not return the line length statistic
which is provided by _calc_windowed_stat and _line_length

Parameters
----------
data : Series
samples_per_window : int
Number of data points in each window
sample_interval : float
Time in minutes in each sample interval
H : 2D ndarray
Hankel matrix defining the indices for each window.

Returns
-------
data_mean : Series
mean of data in each window
data_max : Series
maximum of data in each window
data_slope_nstd : Series
standard deviation of difference between data points in each window
data_slope : Series
difference between successive data points
"""

data_mean = data.values[H].mean(axis=0)
data_mean = _to_centered_series(data_mean, data.index, samples_per_window)
data_max = data.values[H].max(axis=0)
data_max = _to_centered_series(data_max, data.index, samples_per_window)
# shift to get forward difference, .diff() is backward difference instead
data_diff = data.diff().shift(-1)
data_slope = data_diff / sample_interval
data_slope_nstd = _slope_nstd_windowed(data, H, samples_per_window)
data_slope_nstd = data_slope_nstd

return data_mean, data_max, data_slope_nstd, data_slope


def _slope_nstd_windowed(data, H, samples_per_window):
with np.errstate(divide='ignore', invalid='ignore'):
raw = np.diff(data)
raw = raw[H[:-1, ]].std(ddof=1, axis=0) / data.values[H].mean(axis=0)
return _to_centered_series(raw, data.index, samples_per_window)


def _max_diff_windowed(data, H, samples_per_window):
raw = np.diff(data)
raw = np.abs(raw[H[:-1, ]]).max(axis=0)
return _to_centered_series(raw, data.index, samples_per_window)


def _line_length_windowed(data, H, samples_per_window,
sample_interval):
raw = np.sqrt(np.diff(data)**2. + sample_interval**2.)
raw = np.sum(raw[H[:-1, ]], axis=0)
return _to_centered_series(raw, data.index, samples_per_window)


def _to_centered_series(vals, idx, samples_per_window):
vals = np.pad(vals, ((0, len(idx) - len(vals)),), mode='constant',
constant_values=np.nan)
shift = samples_per_window // 2 # align = 'center' only
return pd.Series(index=idx, data=vals).shift(shift)


def _get_sample_intervals(times, win_length):
""" Calculates time interval and samples per window for Reno-style clear
sky detection functions
"""
deltas = np.diff(times.values) / np.timedelta64(1, '60s')

# determine if we can proceed
if times.inferred_freq and len(np.unique(deltas)) == 1:
sample_interval = times[1] - times[0]
sample_interval = sample_interval.seconds / 60 # in minutes
samples_per_window = int(win_length / sample_interval)
return sample_interval, samples_per_window
else:
raise NotImplementedError('algorithm does not yet support unequal '
'times. consider resampling your data.')


def _clear_sample_index(clear_windows, samples_per_window, align, H):
"""
Returns indices of clear samples in clear windows
"""
# H contains indices for each window, e.g. indices for the first window
# are in first column of H.
# clear_windows contains one boolean for each window and is aligned
# by 'align', default to center
# shift clear_windows.index to be aligned left (e.g. first value in the
# left-most position) to line up with the first column of H.

# commented if/else block for future align='left', 'right' capability
# if align == 'right':
# shift = 1 - samples_per_window
# elif align == 'center':
# shift = - (samples_per_window // 2)
# else:
# shift = 0
shift = -(samples_per_window // 2)
idx = clear_windows.shift(shift)
# drop rows at the end corresponding to windows past the end of data
idx = idx.drop(clear_windows.index[1 - samples_per_window:])
idx = idx.astype(bool) # shift changed type to object
clear_samples = np.unique(H[:, idx])
return clear_samples


def detect_clearsky(measured, clearsky, times=None, window_length=10,
mean_diff=75, max_diff=75,
lower_line_length=-5, upper_line_length=10,
var_diff=0.005, slope_dev=8, max_iterations=20,
Expand All @@ -623,24 +736,25 @@ def detect_clearsky(measured, clearsky, times, window_length,
Parameters
----------
measured : array or Series
Time series of measured values.
Time series of measured GHI. [W/m2]
clearsky : array or Series
Time series of the expected clearsky values.
times : DatetimeIndex
Times of measured and clearsky values.
window_length : int
Time series of the expected clearsky GHI. [W/m2]
times : DatetimeIndex or None, default None.
Times of measured and clearsky values. If None the index of measured
will be used.
window_length : int, default 10
Length of sliding time window in minutes. Must be greater than 2
periods.
mean_diff : float, default 75
Threshold value for agreement between mean values of measured
and clearsky in each interval, see Eq. 6 in [1].
and clearsky in each interval, see Eq. 6 in [1]. [W/m2]
max_diff : float, default 75
Threshold value for agreement between maxima of measured and
clearsky values in each interval, see Eq. 7 in [1].
clearsky values in each interval, see Eq. 7 in [1]. [W/m2]
lower_line_length : float, default -5
Lower limit of line length criterion from Eq. 8 in [1].
Criterion satisfied when
lower_line_length < line length difference < upper_line_length
Criterion satisfied when lower_line_length < line length difference
< upper_line_length.
upper_line_length : float, default 10
Upper limit of line length criterion from Eq. 8 in [1].
var_diff : float, default 0.005
Expand Down Expand Up @@ -671,6 +785,13 @@ def detect_clearsky(measured, clearsky, times, window_length,
detected clear_samples. Only provided if return_components is
True.

Raises
------
ValueError
If measured is not a Series and times is not provided
NotImplementedError
If timestamps are not equally spaced

References
----------
.. [1] Reno, M.J. and C.W. Hansen, "Identification of periods of clear
Expand All @@ -692,78 +813,81 @@ def detect_clearsky(measured, clearsky, times, window_length,
* parameters are controllable via keyword arguments
* option to return individual test components and clearsky scaling
parameter
* uses centered windows (Matlab function uses left-aligned windows)
"""

# calculate deltas in units of minutes (matches input window_length units)
deltas = np.diff(times.values) / np.timedelta64(1, '60s')
if times is None:
try:
times = measured.index
except AttributeError:
raise ValueError("times is required when measured is not a Series")

# determine the unique deltas and if we can proceed
unique_deltas = np.unique(deltas)
if len(unique_deltas) == 1:
sample_interval = unique_deltas[0]
# be polite about returning the same type as was input
ispandas = isinstance(measured, pd.Series)

# for internal use, need a Series
if not ispandas:
meas = pd.Series(measured, index=times)
else:
raise NotImplementedError('algorithm does not yet support unequal '
'times. consider resampling your data.')
meas = measured

intervals_per_window = int(window_length / sample_interval)
if not isinstance(clearsky, pd.Series):
clear = pd.Series(clearsky, index=times)
else:
clear = clearsky

# generate matrix of integers for creating windows with indexing
from scipy.linalg import hankel
H = hankel(np.arange(intervals_per_window), # noqa: N806
np.arange(intervals_per_window - 1, len(times)))
sample_interval, samples_per_window = _get_sample_intervals(times,
window_length)

# convert pandas input to numpy array, but save knowledge of input state
# so we can return a series if that's what was originally provided
ispandas = isinstance(measured, pd.Series)
measured = np.asarray(measured)
clearsky = np.asarray(clearsky)
# generate matrix of integers for creating windows with indexing
H = hankel(np.arange(samples_per_window),
np.arange(samples_per_window-1, len(times)))

# calculate measurement statistics
meas_mean = np.mean(measured[H], axis=0)
meas_max = np.max(measured[H], axis=0)
meas_diff = np.diff(measured[H], n=1, axis=0)
meas_slope = np.diff(measured[H], n=1, axis=0) / sample_interval
# matlab std function normalizes by N-1, so set ddof=1 here
meas_slope_nstd = np.std(meas_slope, axis=0, ddof=1) / meas_mean
meas_line_length = np.sum(np.sqrt(
meas_diff * meas_diff +
sample_interval * sample_interval), axis=0)
meas_mean, meas_max, meas_slope_nstd, meas_slope = _calc_stats(
meas, samples_per_window, sample_interval, H)
meas_line_length = _line_length_windowed(
meas, H, samples_per_window, sample_interval)

# calculate clear sky statistics
clear_mean = np.mean(clearsky[H], axis=0)
clear_max = np.max(clearsky[H], axis=0)
clear_diff = np.diff(clearsky[H], n=1, axis=0)
clear_slope = np.diff(clearsky[H], n=1, axis=0) / sample_interval

from scipy.optimize import minimize_scalar

clear_mean, clear_max, _, clear_slope = _calc_stats(
clear, samples_per_window, sample_interval, H)

# find a scaling factor for the clear sky time series that minimizes the
# RMSE between the clear times identified in the measured data and the
# scaled clear sky time series. Optimization to determine the scaling
# factor considers all identified clear times, which is different from [1]
# where the scaling factor was determined from clear times on days with
# at least 50% of the day being identified as clear.
alpha = 1
for iteration in range(max_iterations):
clear_line_length = np.sum(np.sqrt(
alpha * alpha * clear_diff * clear_diff +
sample_interval * sample_interval), axis=0)
scaled_clear = alpha * clear
clear_line_length = _line_length_windowed(
scaled_clear, H, samples_per_window, sample_interval)

line_diff = meas_line_length - clear_line_length

slope_max_diff = _max_diff_windowed(
meas - scaled_clear, H, samples_per_window)
# evaluate comparison criteria
c1 = np.abs(meas_mean - alpha*clear_mean) < mean_diff
c2 = np.abs(meas_max - alpha*clear_max) < max_diff
c3 = (line_diff > lower_line_length) & (line_diff < upper_line_length)
c4 = meas_slope_nstd < var_diff
c5 = np.max(np.abs(meas_slope -
alpha * clear_slope), axis=0) < slope_dev
c5 = slope_max_diff < slope_dev
c6 = (clear_mean != 0) & ~np.isnan(clear_mean)
clear_windows = c1 & c2 & c3 & c4 & c5 & c6

# create array to return
clear_samples = np.full_like(measured, False, dtype='bool')
clear_samples = np.full_like(meas, False, dtype='bool')
# find the samples contained in any window classified as clear
clear_samples[np.unique(H[:, clear_windows])] = True
idx = _clear_sample_index(clear_windows, samples_per_window, 'center',
H)
clear_samples[idx] = True

# find a new alpha
previous_alpha = alpha
clear_meas = measured[clear_samples]
clear_clear = clearsky[clear_samples]
clear_meas = meas[clear_samples]
clear_clear = clear[clear_samples]

def rmse(alpha):
return np.sqrt(np.mean((clear_meas - alpha*clear_clear)**2))
Expand All @@ -773,7 +897,7 @@ def rmse(alpha):
break
else:
import warnings
warnings.warn('failed to converge after %s iterations'
warnings.warn('rescaling failed to converge after %s iterations'
% max_iterations, RuntimeWarning)

# be polite about returning the same type as was input
Expand All @@ -794,8 +918,7 @@ def rmse(alpha):
components['max_diff'] = np.abs(meas_max - alpha * clear_max)
components['line_length'] = meas_line_length - clear_line_length
components['slope_nstd'] = meas_slope_nstd
components['slope_max'] = (np.max(
meas_slope - alpha * clear_slope, axis=0))
components['slope_max'] = slope_max_diff

return clear_samples, components, alpha
else:
Expand Down
Loading