Skip to content

forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
John-Boik opened this issue Nov 18, 2022 · 1 comment · Fixed by #1766
Closed

Comments

@John-Boik
Copy link

I'm new to NOAA/NCEP/NWS models, Siphon/THREDDS etc., and pvlib, but am interested in the pvlib forecast module. I understand that it is depreciated. Indeed, the code at pvlib/forecast.py throws errors, as is evident from the current documentation (which contains error messages) and from running the code on my machine. Perhaps the issue is clear to experts here, but it took me some time to figure out what was happening. I offer a quick/dirty fix below. And I have a comment/question.

The error is: Handler dispatch failed; nested exception is java.lang.AssertionError: Multiple feature collections cannot be written as a CF dataset. It occurs when fm.get_processed_data(latitude, longitude, start, end) is called. I'm using Python 3.8.10, pvlib 0.8.0, and pandas 1.4.3. I'm also using the GFS ForecastModel , but my fix below could be extended for the other models.

Apparently, the issue is that the current code tries to grab data for several types of variables, for example: u-component_of_wind_isobaric, Total_cloud_cover_entire_atmosphere_Mixed_intervals_Average, and Temperature_surface. But it seems that only one type of variable can be accessed in any one query. The variable type can be identified by the last word in the name: "isobaric", "Average", and "surface". The exception to this list is the variable Total_cloud_cover_convective_cloud, which I ignore as it does not seem to be used for power calculations with GFS data. The quick fix is to call each variable type separately, which I do below using a kind keyword. After the dataframes are constructed, I concat them.

An issue/question is that given identical time spans, the three queries return dataframes with somewhat different time indexes. I don't know why this happens. In my example run, after concatenation, the forecast_data dataframe looks like this:

data before dropna, shape=(84, 9) 
                              temp_air  wind_speed        ghi        dni        dhi  total_clouds  low_clouds  mid_clouds  high_clouds
2022-11-18 00:30:00-07:00        NaN         NaN   0.000000   0.000000   0.000000    100.000000         0.0    3.700000        100.0
2022-11-18 02:00:00-07:00  14.460541    2.229132   0.000000   0.000000   0.000000    100.000000         0.0   48.299999        100.0
2022-11-18 05:00:00-07:00  13.531006    2.466797        NaN        NaN        NaN           NaN         NaN         NaN          NaN

I simply drop rows with nans. Does anyone have a better solution? Interpolation for nans would be a possibility. Dropping nans reduces the number of rows from 84 to 28, which is quite a cut.

Below I give my edited version of the GFS class, and after that I give my script (based on forecast.py) which runs it.

class GFS(ForecastModel):
     _resolutions = ['Half', 'Quarter']
    _kind = ['isobaric', 'Average', 'surface']
    def __init__(self, resolution='quarter', set_type='best', kind='isobaric'):
        model_type = 'Forecast Model Data'
        self.kind = kind
        resolution = resolution.title()
        if resolution not in self._resolutions:
            raise ValueError(f'resolution must in {self._resolutions}')
        if kind not in self._kind:
            raise ValueError(f'kind must be in {self._kind}')

        model = f'GFS {resolution} Degree Forecast'

        # isobaric variables will require a vert_level to prevent
        # excessive data downloads
        
        # split the variables into different groups, based on kind
        if kind == 'isobaric':
            self.variables = {
                'wind_speed_u': 'u-component_of_wind_isobaric',
                'wind_speed_v': 'v-component_of_wind_isobaric',
                }
                
            self.output_variables = [
                'wind_speed',
                ]
    
        elif kind == 'Average':
            self.variables = {
                'total_clouds': 'Total_cloud_cover_entire_atmosphere_Mixed_intervals_Average',
                'low_clouds': 'Low_cloud_cover_low_cloud_Mixed_intervals_Average',
                'mid_clouds': 'Medium_cloud_cover_middle_cloud_Mixed_intervals_Average',
                'high_clouds': 'High_cloud_cover_high_cloud_Mixed_intervals_Average',
                'boundary_clouds': 'Total_cloud_cover_boundary_layer_cloud_Mixed_intervals_Average',
                'ghi_raw': 'Downward_Short-Wave_Radiation_Flux_surface_Mixed_intervals_Average'
                }
            
            self.output_variables = [
                'ghi',
                'dni',
                'dhi',
                'total_clouds',
                'low_clouds',
                'mid_clouds',
                'high_clouds'
                ]

        elif kind == 'surface':
            self.variables = {
                'temp_air': 'Temperature_surface',
                'wind_speed_gust': 'Wind_speed_gust_surface',
                #'convect_clouds': 'Total_cloud_cover_convective_cloud',
                }
            
            self.output_variables = [
                'temp_air',
                ]

        super().__init__(model_type, model, set_type, vert_level=100000)
        

    def process_data(self, data, cloud_cover='total_clouds', **kwargs):
        """
        Defines the steps needed to convert raw forecast data
        into processed forecast data.

        Parameters
        ----------
        data: DataFrame
            Raw forecast data
        cloud_cover: str, default 'total_clouds'
            The type of cloud cover used to infer the irradiance.

        Returns
        -------
        data: DataFrame
            Processed forecast data.
        """
        data = super().process_data(data, **kwargs)
        
        if self.kind == 'Average':
            irrads = self.cloud_cover_to_irradiance(data[cloud_cover], **kwargs)
            data = data.join(irrads, how='outer')
        elif self.kind == "isobaric":    
            data['wind_speed'] = self.uv_to_speed(data)
        elif self.kind == "surface":
            data['temp_air'] = self.kelvin_to_celsius(data['temp_air'])
        return data[self.output_variables]

My code to run the example in forecast.py is below.

from pvlib import solarposition, irradiance, atmosphere, pvsystem, inverter, temperature
#from pvlib.forecast import GFS, NAM, NDFD, RAP, HRRR
from forecast_module import GFS, NAM, NDFD, RAP, HRRR  # my version of the forecast module
import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys
import time
import pytz

# Choose a location.
# Tucson, AZ
latitude = 32.2
longitude = -110.9
tz = 'US/Mountain'
tzinfo = pytz.timezone(tz)
surface_tilt = 30
surface_azimuth = 180 # pvlib uses 0=North, 90=East, 180=South, 270=West convention
albedo = 0.2

start = pd.Timestamp(datetime.date.today(), tz=tz) # today's date
#dt = datetime.datetime(2020, 7, 7, 6, 0, tzinfo=tzinfo)
#start = pd.Timestamp(dt)
end = start + pd.Timedelta(days=7) # 7 days from today

# Define forecast model
data = {}
for kind in ['isobaric', 'Average', 'surface']:
    fm = GFS(kind=kind)
    data[kind] = {'fm': fm, 'data': fm.get_processed_data(latitude, longitude, start, end)}

forecast_data = pd.concat([data['surface']['data'],data['isobaric']['data'],data['Average']['data']], axis=1)
# The three kinds have different time indices, and so forecast_data has nans in some rows. Drop these.
print("\ndata before dropna, shape={} \n{}\n".format(forecast_data.shape, forecast_data.head()))
forecast_data.dropna(axis=0, how='any', inplace=True) 
print("\ndata after dropna, shape={} \n{}\n".format(forecast_data.shape, forecast_data.head()))


forecast_data['temp_air'].plot()
plt.title('temp_air');
plt.savefig("temp_air.png")
plt.close('all')

ghi = forecast_data['ghi']
ghi.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('GHI');
plt.savefig("ghi.png")
plt.close('all')

# retrieve time and location parameters, for Average which contains dhi/dni data
time = forecast_data.index
a_point = data['Average']['fm'].location


solpos = a_point.get_solarposition(time, method='nrel_numba')
solpos.plot()
plt.title('solar position');
plt.savefig("solpos.png")
plt.close('all')


dni_extra = irradiance.get_extra_radiation(time)
dni_extra.plot()
plt.ylabel('Extra terrestrial radiation ($W/m^{-2}$)')
plt.title('DNI Extra');
plt.savefig("dni_extra.png")
plt.close('all')


airmass = atmosphere.get_relative_airmass(solpos['apparent_zenith'])
airmass.plot()
plt.ylabel('Airmass')
plt.title('Airmass');
plt.savefig("Airmass.png")
plt.close('all')


poa_sky_diffuse = irradiance.haydavies(
    surface_tilt, 
    surface_azimuth, 
    forecast_data['dhi'], 
    forecast_data['dni'], 
    dni_extra,
    solpos['apparent_zenith'], 
    solpos['azimuth']
    )
poa_sky_diffuse.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Sky Diffuse');
plt.savefig("poa_sky_diffuse.png")
plt.close('all')


poa_ground_diffuse = irradiance.get_ground_diffuse(surface_tilt, ghi, albedo=albedo)
poa_ground_diffuse.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Ground Diffuse');
plt.savefig("poa_ground_diffuse.png")
plt.close('all')

aoi = irradiance.aoi(surface_tilt, surface_azimuth, solpos['apparent_zenith'], solpos['azimuth'])
aoi.plot()
plt.ylabel('Angle of incidence (deg)')
plt.title('Angle of Incidence');
plt.savefig("aoi.png")
plt.close('all')


poa_irrad = irradiance.poa_components(
    aoi, 
    forecast_data['dni'], 
    poa_sky_diffuse, 
    poa_ground_diffuse
    )
poa_irrad.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Irradiance');
plt.savefig("poa_irrad.png")
plt.close('all')


ambient_temperature = forecast_data['temp_air']  
wnd_spd = forecast_data['wind_speed'] 
thermal_params = temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_polymer']
pvtemp = temperature.sapm_cell(poa_irrad['poa_global'], ambient_temperature, wnd_spd, **thermal_params)
pvtemp.plot()
plt.ylabel('Temperature (C)');
plt.title('PV Temp');
plt.savefig("pvtemp.png")
plt.close('all')


sandia_modules = pvsystem.retrieve_sam('SandiaMod')
sandia_module = sandia_modules.Canadian_Solar_CS5P_220M___2009_


effective_irradiance = pvsystem.sapm_effective_irradiance(poa_irrad.poa_direct, poa_irrad.poa_diffuse, 
                                                          airmass, aoi, sandia_module)

sapm_out = pvsystem.sapm(effective_irradiance, pvtemp, sandia_module)
print("\nsapm_out= \n{}\n".format(sapm_out.head()))

sapm_out[['p_mp']].plot()
plt.ylabel('DC Power (W)');
plt.title('DC Power');
plt.savefig("dc_power.png")
plt.close('all')

sapm_inverters = pvsystem.retrieve_sam('sandiainverter')
sapm_inverter = sapm_inverters['ABB__MICRO_0_25_I_OUTD_US_208__208V_']

p_ac = inverter.sandia(sapm_out.v_mp, sapm_out.p_mp, sapm_inverter)
p_ac.plot()
plt.ylabel('AC Power (W)')
plt.ylim(0, None);
plt.title('AC Power');
plt.savefig("ac_power.png")
plt.close('all')

print("\np_ac describe= \{}\n".format(p_ac.describe()))
p_ac.index.freq
p_ac[start:start+pd.Timedelta(days=2)].plot();
plt.title('AC Power, 2 Day');
plt.savefig("ac_power_2day.png")
plt.close('all')

# integrate power to find energy yield over the forecast period
energy = p_ac.sum() * 3
print("\nenergy= {}\n".format(energy))
# I think it would be better to integrate p_ac over time to get the energy.
@kandersolar
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants