Skip to content

[TrendAnalysis] ad_hoc_filter index isn't checked for consistency with index of other filters #266

@kandersolar

Description

@kandersolar

Describe the bug
The TrendAnalysis filtering step implicitly assumes that the ad_hoc_filter has the same index as the other filter time series. In cases where the TrendAnalysis constructor interpolation changes the index of the input time series, which then propagates to the built-in filters, that assumption is violated and the analysis results can be affected.

To Reproduce

import pandas as pd
import rdtools

idx1 = pd.date_range('2017-01-01 00:00', '2018-01-01', freq='15min', closed='left')
idx2 = pd.date_range('2018-01-01 00:05', '2019-01-02', freq='15min', closed='left')
times = idx1.append(idx2)

df = pd.DataFrame({
    'poa_global': 1000,
    'power_ac': 1.0,
    'temperature_cell': 25,
}, index=times)

ta = rdtools.analysis_chains.TrendAnalysis(df.power_ac, df.poa_global, df.temperature_cell,
                                           interp_freq='15T')
# disable the quantile clipping filter, it's irrelevant and a nuisance here
ta.filter_params.pop('clip_filter')

# toggle this line on/off:
ta.filter_params['ad_hoc_filter'] = pd.Series(True, df.index)
ta.sensor_analysis()  # fails if above line is uncommented

The input time series switches halfway through from being on the minute intervals [0, 15, 30, 45] to [5, 20, 35, 50]. The automatic interpolation realigns the timestamps to all be on [0, 15, 30, 45]. However, the ad_hoc_filter still uses the mixed timestamps. In the filtering step, this means that some values from the other filters don't have a match in the ad_hoc_filter, so they get set to False. And likewise the values in the ad_hoc_filter don't have a match in the other filters, so those timestamps get set to False as well. That means the entire second year gets filtered out, despite the fact that all values in the ad_hoc_filter are True!

Expected behavior
Not 100% sure what the correct behavior here should be. If we're unwilling to try to automatically adjust the ad_hoc_filter to match the post-interpolation time series, which I think is wise, then we should at least alert the user that their ad_hoc_filter probably isn't being used the way they expect. Some ideas for that alert:

  • Check ad_hoc_filter.index.equals(self.pv_power.index), or something like that, and raise an error if the indexes aren't equal. The way to prevent such an error would be for the user to create the TrendAnalysis instance, then calculate their ad_hoc_filter using the interpolated TrendAnalysis attributes so that the indexes are the same when it's time to filter.
  • If the above option turns out to be too strict (for example maybe the interpolation creates some very minor difference like an extra nighttime value because of a DST transition), another option is to require all points that pass the other filters to have matches in the ad_hoc_filter, or in Mike's words "everything that stays in at the end of the day has to be explicitly represented in the ad hoc filter". That would allow some flexibility around nuisance issues for values that don't matter, but still be strict on the values that do.

Additional context
#263 might be relevant

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions