forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

John-Boik · 2022-11-18T20:36:57Z

I'm new to NOAA/NCEP/NWS models, Siphon/THREDDS etc., and pvlib, but am interested in the pvlib forecast module. I understand that it is depreciated. Indeed, the code at pvlib/forecast.py throws errors, as is evident from the current documentation (which contains error messages) and from running the code on my machine. Perhaps the issue is clear to experts here, but it took me some time to figure out what was happening. I offer a quick/dirty fix below. And I have a comment/question.

The error is: Handler dispatch failed; nested exception is java.lang.AssertionError: Multiple feature collections cannot be written as a CF dataset. It occurs when fm.get_processed_data(latitude, longitude, start, end) is called. I'm using Python 3.8.10, pvlib 0.8.0, and pandas 1.4.3. I'm also using the GFS ForecastModel , but my fix below could be extended for the other models.

Apparently, the issue is that the current code tries to grab data for several types of variables, for example: u-component_of_wind_isobaric, Total_cloud_cover_entire_atmosphere_Mixed_intervals_Average, and Temperature_surface. But it seems that only one type of variable can be accessed in any one query. The variable type can be identified by the last word in the name: "isobaric", "Average", and "surface". The exception to this list is the variable Total_cloud_cover_convective_cloud, which I ignore as it does not seem to be used for power calculations with GFS data. The quick fix is to call each variable type separately, which I do below using a kind keyword. After the dataframes are constructed, I concat them.

An issue/question is that given identical time spans, the three queries return dataframes with somewhat different time indexes. I don't know why this happens. In my example run, after concatenation, the forecast_data dataframe looks like this:

data before dropna, shape=(84, 9) 
                              temp_air  wind_speed        ghi        dni        dhi  total_clouds  low_clouds  mid_clouds  high_clouds
2022-11-18 00:30:00-07:00        NaN         NaN   0.000000   0.000000   0.000000    100.000000         0.0    3.700000        100.0
2022-11-18 02:00:00-07:00  14.460541    2.229132   0.000000   0.000000   0.000000    100.000000         0.0   48.299999        100.0
2022-11-18 05:00:00-07:00  13.531006    2.466797        NaN        NaN        NaN           NaN         NaN         NaN          NaN

I simply drop rows with nans. Does anyone have a better solution? Interpolation for nans would be a possibility. Dropping nans reduces the number of rows from 84 to 28, which is quite a cut.

Below I give my edited version of the GFS class, and after that I give my script (based on forecast.py) which runs it.

class GFS(ForecastModel):
     _resolutions = ['Half', 'Quarter']
    _kind = ['isobaric', 'Average', 'surface']
    def __init__(self, resolution='quarter', set_type='best', kind='isobaric'):
        model_type = 'Forecast Model Data'
        self.kind = kind
        resolution = resolution.title()
        if resolution not in self._resolutions:
            raise ValueError(f'resolution must in {self._resolutions}')
        if kind not in self._kind:
            raise ValueError(f'kind must be in {self._kind}')

        model = f'GFS {resolution} Degree Forecast'

        # isobaric variables will require a vert_level to prevent
        # excessive data downloads
        
        # split the variables into different groups, based on kind
        if kind == 'isobaric':
            self.variables = {
                'wind_speed_u': 'u-component_of_wind_isobaric',
                'wind_speed_v': 'v-component_of_wind_isobaric',
                }
                
            self.output_variables = [
                'wind_speed',
                ]
    
        elif kind == 'Average':
            self.variables = {
                'total_clouds': 'Total_cloud_cover_entire_atmosphere_Mixed_intervals_Average',
                'low_clouds': 'Low_cloud_cover_low_cloud_Mixed_intervals_Average',
                'mid_clouds': 'Medium_cloud_cover_middle_cloud_Mixed_intervals_Average',
                'high_clouds': 'High_cloud_cover_high_cloud_Mixed_intervals_Average',
                'boundary_clouds': 'Total_cloud_cover_boundary_layer_cloud_Mixed_intervals_Average',
                'ghi_raw': 'Downward_Short-Wave_Radiation_Flux_surface_Mixed_intervals_Average'
                }
            
            self.output_variables = [
                'ghi',
                'dni',
                'dhi',
                'total_clouds',
                'low_clouds',
                'mid_clouds',
                'high_clouds'
                ]

        elif kind == 'surface':
            self.variables = {
                'temp_air': 'Temperature_surface',
                'wind_speed_gust': 'Wind_speed_gust_surface',
                #'convect_clouds': 'Total_cloud_cover_convective_cloud',
                }
            
            self.output_variables = [
                'temp_air',
                ]

        super().__init__(model_type, model, set_type, vert_level=100000)
        

    def process_data(self, data, cloud_cover='total_clouds', **kwargs):
        """
        Defines the steps needed to convert raw forecast data
        into processed forecast data.

        Parameters
        ----------
        data: DataFrame
            Raw forecast data
        cloud_cover: str, default 'total_clouds'
            The type of cloud cover used to infer the irradiance.

        Returns
        -------
        data: DataFrame
            Processed forecast data.
        """
        data = super().process_data(data, **kwargs)
        
        if self.kind == 'Average':
            irrads = self.cloud_cover_to_irradiance(data[cloud_cover], **kwargs)
            data = data.join(irrads, how='outer')
        elif self.kind == "isobaric":    
            data['wind_speed'] = self.uv_to_speed(data)
        elif self.kind == "surface":
            data['temp_air'] = self.kelvin_to_celsius(data['temp_air'])
        return data[self.output_variables]

My code to run the example in forecast.py is below.

from pvlib import solarposition, irradiance, atmosphere, pvsystem, inverter, temperature
#from pvlib.forecast import GFS, NAM, NDFD, RAP, HRRR
from forecast_module import GFS, NAM, NDFD, RAP, HRRR  # my version of the forecast module
import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys
import time
import pytz

# Choose a location.
# Tucson, AZ
latitude = 32.2
longitude = -110.9
tz = 'US/Mountain'
tzinfo = pytz.timezone(tz)
surface_tilt = 30
surface_azimuth = 180 # pvlib uses 0=North, 90=East, 180=South, 270=West convention
albedo = 0.2

start = pd.Timestamp(datetime.date.today(), tz=tz) # today's date
#dt = datetime.datetime(2020, 7, 7, 6, 0, tzinfo=tzinfo)
#start = pd.Timestamp(dt)
end = start + pd.Timedelta(days=7) # 7 days from today

# Define forecast model
data = {}
for kind in ['isobaric', 'Average', 'surface']:
    fm = GFS(kind=kind)
    data[kind] = {'fm': fm, 'data': fm.get_processed_data(latitude, longitude, start, end)}

forecast_data = pd.concat([data['surface']['data'],data['isobaric']['data'],data['Average']['data']], axis=1)
# The three kinds have different time indices, and so forecast_data has nans in some rows. Drop these.
print("\ndata before dropna, shape={} \n{}\n".format(forecast_data.shape, forecast_data.head()))
forecast_data.dropna(axis=0, how='any', inplace=True) 
print("\ndata after dropna, shape={} \n{}\n".format(forecast_data.shape, forecast_data.head()))


forecast_data['temp_air'].plot()
plt.title('temp_air');
plt.savefig("temp_air.png")
plt.close('all')

ghi = forecast_data['ghi']
ghi.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('GHI');
plt.savefig("ghi.png")
plt.close('all')

# retrieve time and location parameters, for Average which contains dhi/dni data
time = forecast_data.index
a_point = data['Average']['fm'].location


solpos = a_point.get_solarposition(time, method='nrel_numba')
solpos.plot()
plt.title('solar position');
plt.savefig("solpos.png")
plt.close('all')


dni_extra = irradiance.get_extra_radiation(time)
dni_extra.plot()
plt.ylabel('Extra terrestrial radiation ($W/m^{-2}$)')
plt.title('DNI Extra');
plt.savefig("dni_extra.png")
plt.close('all')


airmass = atmosphere.get_relative_airmass(solpos['apparent_zenith'])
airmass.plot()
plt.ylabel('Airmass')
plt.title('Airmass');
plt.savefig("Airmass.png")
plt.close('all')


poa_sky_diffuse = irradiance.haydavies(
    surface_tilt, 
    surface_azimuth, 
    forecast_data['dhi'], 
    forecast_data['dni'], 
    dni_extra,
    solpos['apparent_zenith'], 
    solpos['azimuth']
    )
poa_sky_diffuse.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Sky Diffuse');
plt.savefig("poa_sky_diffuse.png")
plt.close('all')


poa_ground_diffuse = irradiance.get_ground_diffuse(surface_tilt, ghi, albedo=albedo)
poa_ground_diffuse.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Ground Diffuse');
plt.savefig("poa_ground_diffuse.png")
plt.close('all')

aoi = irradiance.aoi(surface_tilt, surface_azimuth, solpos['apparent_zenith'], solpos['azimuth'])
aoi.plot()
plt.ylabel('Angle of incidence (deg)')
plt.title('Angle of Incidence');
plt.savefig("aoi.png")
plt.close('all')


poa_irrad = irradiance.poa_components(
    aoi, 
    forecast_data['dni'], 
    poa_sky_diffuse, 
    poa_ground_diffuse
    )
poa_irrad.plot()
plt.ylabel('Irradiance ($W/m^{-2}$)')
plt.title('POA Irradiance');
plt.savefig("poa_irrad.png")
plt.close('all')


ambient_temperature = forecast_data['temp_air']  
wnd_spd = forecast_data['wind_speed'] 
thermal_params = temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_polymer']
pvtemp = temperature.sapm_cell(poa_irrad['poa_global'], ambient_temperature, wnd_spd, **thermal_params)
pvtemp.plot()
plt.ylabel('Temperature (C)');
plt.title('PV Temp');
plt.savefig("pvtemp.png")
plt.close('all')


sandia_modules = pvsystem.retrieve_sam('SandiaMod')
sandia_module = sandia_modules.Canadian_Solar_CS5P_220M___2009_


effective_irradiance = pvsystem.sapm_effective_irradiance(poa_irrad.poa_direct, poa_irrad.poa_diffuse, 
                                                          airmass, aoi, sandia_module)

sapm_out = pvsystem.sapm(effective_irradiance, pvtemp, sandia_module)
print("\nsapm_out= \n{}\n".format(sapm_out.head()))

sapm_out[['p_mp']].plot()
plt.ylabel('DC Power (W)');
plt.title('DC Power');
plt.savefig("dc_power.png")
plt.close('all')

sapm_inverters = pvsystem.retrieve_sam('sandiainverter')
sapm_inverter = sapm_inverters['ABB__MICRO_0_25_I_OUTD_US_208__208V_']

p_ac = inverter.sandia(sapm_out.v_mp, sapm_out.p_mp, sapm_inverter)
p_ac.plot()
plt.ylabel('AC Power (W)')
plt.ylim(0, None);
plt.title('AC Power');
plt.savefig("ac_power.png")
plt.close('all')

print("\np_ac describe= \{}\n".format(p_ac.describe()))
p_ac.index.freq
p_ac[start:start+pd.Timedelta(days=2)].plot();
plt.title('AC Power, 2 Day');
plt.savefig("ac_power_2day.png")
plt.close('all')

# integrate power to find energy yield over the forecast period
energy = p_ac.sum() * 3
print("\nenergy= {}\n".format(energy))
# I think it would be better to integrate p_ac over time to get the energy.

The text was updated successfully, but these errors were encountered:

kandersolar · 2022-12-01T21:22:57Z

I've not tried it myself, but @kahemker's code here may be of interest: https://github.com/kahemker/pvlib-python/tree/thredds-v5-support-pvlib0.9

See also his google group post: https://groups.google.com/g/pvlib-python/c/BGV1S7QagRo/m/ELPXf6gwAQAJ

kandersolar mentioned this issue Jun 12, 2023

Remove pvlib.forecast #1766

Merged

9 tasks

kandersolar closed this as completed in #1766 Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

John-Boik commented Nov 18, 2022

kandersolar commented Dec 1, 2022

forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

forecast module errors: Multiple feature collections cannot be written as a CF dataset #1596

Comments

John-Boik commented Nov 18, 2022

kandersolar commented Dec 1, 2022