Skip to content

Add BrightSun clear-sky detection method #1061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cwhanse opened this issue Sep 16, 2020 · 11 comments
Open

Add BrightSun clear-sky detection method #1061

cwhanse opened this issue Sep 16, 2020 · 11 comments

Comments

@cwhanse
Copy link
Member

cwhanse commented Sep 16, 2020

A recent paper describes improvements on pvlib.clearsky.detect_clearsky. I've got a port of their Matlab code to a python function detect_clearsky_brightsun, and trying to keep the interface common for the two detect_clearsky functions.

Although the BrightSun method uses the similar statistics to determine clearsky times, the limits aren't constants - they are functions of zenith. And the method needs additional parameters for the duration filters and the optimization of the clear-sky model. The duration filters are different in the ~hour after sunrise/before sunset than in midday.

I have the limit and control arguments coded as dicts and provide constant dicts with the default values (which needs to change, since we don't want mutable defaults).

For the solar position related inputs (zenith and sunrise/sunset times), several options:

  • pass in a pvlib.location.Location and calculate solar position and sunrise/set times internal to detect_clearsky_brightsun. This is what I have done in the draft function, it makes the code straightforward but at the expense of a less-than-explicit interface.
  • pass zenith, sunrise and sunset as separate arguments. This seems awkward and somewhat risky, since calculating sunrise and sunset times with pvlib.solarposition.rise_set_transit_xxx can have different behaviors (e.g., using _spa the first sunset time is after the first sunrise time, even if the data start at midday, _ephem has a kwarg that controls this behavior and returns the first sunset after the first data point).
  • pass zenith only and a kwarg for a zenith angle that defines sunrise and sunset, rather than asking for sunrise/sunset as input or using the solarposition functions.

Interested in feedback about this approach before I open a PR.

Current detect_clearsky function:

def detect_clearsky(measured, clearsky, times, window_length,
                    mean_diff=75, max_diff=75,
                    lower_line_length=-5, upper_line_length=10,
                    var_diff=0.005, slope_dev=8, max_iterations=20,
                    return_components=False):

Draft new function and dicts:

# constants used for BrightSun algorithm
# dict of functions that return limits for each criteria for BrightSun
BRIGHT_SUN_LIMITS = {
    'mean_diff': _mean_max_lim,
    'max_diff': _mean_max_lim,
    'line_length': _line_length_lim,
    'slope_nstd': lambda x: 0.4,
    'slope_diff': _slope_diff_lim}


# dict of parameters for daily clear-sky model optimization for BrightSun
BRIGHT_SUN_OPTIM = {
    'min_ghi': 30, 'min_pts_per_day': 60, 'alpha_limits': (0.7, 1.5),
    'maxiter': 20}


# dict of parameters for duration criteria for BrightSun
BRIGHT_SUN_FILTERS = {'long_window': 90, 'long_fraction': (90 - 80) / 90.,
                      'separation_window': 30,
                      'rise_set_period': 90, 'rise_set_window': 10,
                      'rise_set_fraction': (10 - 2) / 10}

def detect_clearsky_brightsun(measured, clearsky, times,
                              location,
                              window_length,
                              limits=BRIGHT_SUN_LIMITS,
                              optim=BRIGHT_SUN_OPTIM,
                              filters=BRIGHT_SUN_FILTERS,
                              return_components=False):

The values for the BRIGHT_SUN_LIMITS dict are functions:

def _mean_max_lim(x):
    # limit for criteria 1 and 2 of BrightSun algorithm
    return np.piecewise(
        x, condlist=[x < 20, (x >= 20) & (x < 30), (x >= 30) & (x < 90),
                     x >= 90],
        funclist=[0.25, lambda x: np.interp(x, xp=[20, 30], fp=[0.25, 0.125]),
                  lambda x: np.interp(x, xp=[20, 30], fp=[0.125, 0.5]), 0.5])


def _line_length_lim(x):
    # for criteria 3 of BrightSun algorithm
    # limit at zenith >= 30 is 0.5 per email with Jaime Bright and Matlab code
    # J. Bright states he believes the value of 2 in the 2020 paper is
    # incorrect
    return np.piecewise(x, condlist=[x < 30, x >= 30],
        funclist=[lambda x: np.interp(x, xp=[0, 30], fp=[7, 0.5]), 0.5])


def _slope_diff_lim(x):
    # for criteria 5 of BrightSun algorithm
    return np.piecewise(x, condlist=[x < 30, x >= 30],
        funclist=[lambda x: np.interp(x, xp=[0, 30], fp=[45, 15]), 15])
@wholmgren
Copy link
Member

I'd rather see explict kwargs than hide configuration in the dicts.

Do we need to expose this much configuration in the API? I suspect that most users that are willing to modify the BRIGHT_SUN_LIMITS dict are going to be willing to modify the source code instead.

Passing zenith seems straightforward, especially if we skip the sunrise/sunset definition kwargs.

-1 to passing Location objects. It's inconsistent with the rest of the pvlib function layer and promotes inefficient code.

@cwhanse
Copy link
Member Author

cwhanse commented Sep 16, 2020

I agree about passing zenith, and avoiding sunrise and sunset argument.

Hide the constant dicts and remove the kwargs? It might be nice to be able to turn off the duration filtering step.

@wholmgren
Copy link
Member

Hide the constant dicts and remove the kwargs? It might be nice to be able to turn off the duration filtering step.

I'm probably missing something, but this sounds like 1 kwarg (perhaps a boolean that controls several parameters) and the rest is hard coded within the function. So I don't see where private constant dicts are going to be helpful in this case but you'd know better than me at this point.

Are you working from the source code at https://github.com/JamieMBright/csd-library? Looks like you'll need to include a copy of the apache license.

@cwhanse
Copy link
Member Author

cwhanse commented Sep 16, 2020

one kwarg to control if duration filters are used or not. My preference would be to keep private dict constants as a way to bundle like parameters in case a user wants to modify them. To me, that's more convenient than having to find the parameters inside the function.

I'm working primarily from the paper, referring to the Matlab code to clarify details, so I suppose a case could be made that it's not a derivative work. Better safe than not, however. Any advice on how to include that lengthy license? Definitely do not want to add 200 lines of text to the docstring.

@wholmgren
Copy link
Member

Perhaps add a LICENSES directory like pandas https://github.com/pandas-dev/pandas/tree/master/LICENSES

@cwhanse
Copy link
Member Author

cwhanse commented Sep 16, 2020

That's not a bad idea. Are you sure that a native python implementation is a derivative work? The way I read that definition, I don't think it is:

For the purposes of this License, Derivative Works
shall not include works that remain separable from, or merely link (or bind by
name) to the interfaces of, the Work and Derivative Works thereof.

The Work is the Matlab code, which isn't used here at all. In effect the code is used as supplemental documentation for the paper.

@wholmgren
Copy link
Member

Are you sure that a native python implementation is a derivative work?

I am not. Maybe @mikofski or someone else knows better. Happy to trust your instincts on it. I was initially under the impression that you were more porting the code than re-implementing the paper.

@cwhanse
Copy link
Member Author

cwhanse commented Sep 17, 2020

An issue with using a zenith argument to determine sunrise and sunset - that forces the user to pad the data to include sunrise/sunset periods. What about latitude and longitude arguments instead? That's explicit and avoids asking for a pvlib.location.Location.

@wholmgren
Copy link
Member

Is this an issue because the filters are based on minutes from sunrise/sunset rather than a zenith angle?

The detect_clearsky algorithm currently requires regular intervals, so most user data probably already includes sunrise and sunset periods.

What about latitude and longitude arguments instead? That's explicit and avoids asking for a pvlib.location.Location.

I'd really like to avoid calculating solar position within the function. It's a relatively expensive computation and it's likely the user already has that data at hand.

Is there a requirement that the data is 1 minute or < 15 minute intervals?

@cwhanse
Copy link
Member Author

cwhanse commented Sep 17, 2020

Yes, the filters are defined by minutes from sunrise/sunset, and the algorithm is designed for minute data. No requirement that the data are regular, that's more a limitation of the python implementation. The developers anticipated processing many days of data but there's no inherent reason it couldn't be applied to a single day. If the user has trimmed data to e.g. GHI > 100, then the data won't include sunrise and sunset times. It seems awkward to require that the input span before sunrise/after sunset.

Do you think that e.g. pvlib.solarposition.sun_rise_set_transit_spa would be less expensive than calculating solar position? The interface could require the user to supply zenith, latitude and longitude, and the function can calculate sunrise/sunset times. I would like to avoid requiring the user to pass in sunrise/sunset times, although not strongly against it. I just think it's a fiddly calculation that the function can do as a convenience for the user.

@wholmgren
Copy link
Member

I'm switching between tabs and projects struggling with solar position.

I was under the impression that zenith is needed for other binning purposes, which is part of why I keep coming back to it.

If the user has trimmed data to e.g. GHI > 100, then the data won't include sunrise and sunset times.

I'd advise this user to perform the GHI trim after the clear sky filter. I don't think that's unreasonable.

Do you think that e.g. pvlib.solarposition.sun_rise_set_transit_spa would be less expensive than calculating solar position?

https://pvlib-benchmarker.github.io/pvlib-benchmarks/#summarylist?sort=0&dir=asc shows that sun_rise_set_transit_spa computes 100 days of sunrise/sunset data in tens of milliseconds and spa_python computes 100 days of 1 minute solar positions in about a second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants