-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add sodapro.get_cams_radiation #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mikofski Thanks for the help in making a new pull request. The previous PR #1172 has now been closed. As previously mentioned: I've asked the developing team of the CAMS radiation and McClear services, and have been informed that there is not and will not be any demo API email/access. So if tests have to be implemented, we need to register a PVLIB email. Only a registered email address is necessary and no password will ever be public. The function includes some very convenient features, including the label, integrated, and map_variable arguments, making it very intuitive for Python users. I'd very much like a critical look at the function and feedback on how it can be improved. |
@mikofski As far as I can tell CAMS McClear and ECMWF MACC pvlib.iotools.ecmwf_macc return very different parameters. |
I'm not very familiar with the details of CAMS services, but it appears the major differences between this McClear data reader and the existing MACC reader are in spectral components of AOD. McClear returns AOD at 550 nm for a variety of particles, while MACC returns AOD at a variety of wavelengths. So I think they are complementary and both readers can exist in pvlib. The CAMS data license (available here) allows us to bundle a file for testing. How about one function for parsing data and one function for fetching data? The fetch function can call the parse function so a user only needs to make one function call. But separating the logic lets users load data in a different manner if desired (e.g. from disk or using threads). |
@wholmgren I will separate it into a get and read/parse function. Any preference in calling it read or parse? Is there a difference? The nice part about having it in one function is that you know if the file has 6 (verbose=False) or 23 columns (verbose=True). I suppose this can be solved by reading in the file, and then checking how many columns there are, and based on this assign the 6 or 23 column name header. The primary output of CAMS McClear is the time-series of broad-band clear sky irradiance. This clear-sky model can be used for a number of purposes, e.g. quality checks, etc., and has the major advantage over most clear-sky models in that it is based on data for the specific location and the specific time and not just seasonal trends. The ECMWF MACC reader only returns AOD for different wavelengths and water vapor, which has very different uses. |
I'm guessing parse. Check out #1155 has some recent discussion relevant to the benefits of splitting up fetching and parsing.
That could work, but you could also keep it simple and add that flag to the read/parse function too. Most people will use the higher level function for direct downloads + parsing and it's not a big deal to make users of the lower level parse function repeat a parameter. |
@wholmgren Ok, I will attempt to make a similar structure, with parse, read, and get. Or should we stick to just parse and get? |
I don't see that it's an issue to pass any of those arguments from one function to another or to expect that a user with local data already knows something about that data. But I don't want to over complicate it - iotools is supposed to be relatively easy to contribute to. It's really fine with me if you want to keep it all in one function. |
@wholmgren I have now created read and parse sub-functions similar to psm3. It is indeed nice that it can retrieve files, but also work with local files. Also, I found a way to parse the meta-data nicely and get the column names from the file itself. The documentation for the two new functions is not fully developed, but I will do this within the next few days. I'll also be working on making tests. Though you are welcome to look through / test the function now. |
And minor updates to the documentation
@wholmgren I have renamed the function to 'get_cams_radiation' as I have extended the functions to read/parse/get both CAMS Radiation and McClear as the two services follow the same format (API input is identical, with the exception of one string, specifying whether the users wants Radiation or McClear). I have been looking into developing tests, but I am very uncertain about which aspects need to be tested? Do we want to register a pvlib email and test the 'get' method by making a real API call? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good @AdamRJensen! Some notes below, and let's talk about the tests on our call later.
if (time_step == '1d') | (time_step == '1M'): | ||
data.index = pd.DatetimeIndex(data.index.date) | ||
# For monthly data with 'right' label, the index should be the last | ||
# date of the month and not the first date of the following month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way seems reasonable to me (and I personally prefer left-labeling to avoid questions like this), but I wonder what other people think. As a point of comparison, TMY3 data are right-labeled but without a sub-unit shift like this.
I also wonder if all this labeling code is worth it -- maybe we should just return the start observation time and be done with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just return the start observation time and be done with it.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognize your concern and would perhaps agree if it the service didn't return monthly data.
My main argument is that Pandas labels minute and daily data by the left interval, however, monthly data is labelled right (i.e., last day of the month). The majority of users of monthly data would be required to manually change the index labels, before being able to merge with any other resample data in pandas.
Personally I think it adds a lot of functionality that makes the function a lot easier to use, and that it would be a shame nixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW here are some one-liners for converting month-begin data to month-end:
In [20]: s = pd.Series([1,2,3], pd.date_range('2019-01-01', periods=3, freq='MS'))
In [21]: s.index
Out[21]: DatetimeIndex(['2019-01-01', '2019-02-01', '2019-03-01'], dtype='datetime64[ns]', freq='MS')
In [22]: s.resample('M').mean().index
Out[22]: DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31'], dtype='datetime64[ns]', freq='M')
# or
In [23]: s.to_period('M').to_timestamp('M').index
Out[23]: DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31'], dtype='datetime64[ns]', freq='M')
How about removing the label
parameter but adding a docstring example with (a better version of) the above snippet for the benefit of whatever subset of pvlib users want monthly insolation? E.g. this, which renders as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with the tests @AdamRJensen.
if (time_step == '1d') | (time_step == '1M'): | ||
data.index = pd.DatetimeIndex(data.index.date) | ||
# For monthly data with 'right' label, the index should be the last | ||
# date of the month and not the first date of the following month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just return the start observation time and be done with it.
+1
Co-authored-by: Kevin Anderson <[email protected]>
Co-authored-by: Kevin Anderson <[email protected]>
Also, includes a few minor changes addressing the review by @kanderso-nrel
Includes match keywords for tests that asserts warning and error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AdamRJensen. @kanderso-nrel I'll let you decide when to merge this and refrain from further review unless requested.
@wholmgren @kanderso-nrel I recommend that this first gets merged when a decision on #1245 is made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here, this looks great. Thanks @AdamRJensen! One more idea, what do you think about adding the new-to-pvlib bhi
acronym to the Variables and Symbols list?
Closes #xxxxdocs/sphinx/source/api.rst
for API changes.docs/sphinx/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
).Add a function to retrieve CAMS McClear clear-sky radiation as previously unsuccessfully attempted in #271, #274, #279. This is a new pull request as my previous attempt #1172 erroneously had a long/incorrect commit history.