Skip to content

Add MIDC reader #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 29, 2018
Merged

Add MIDC reader #605

merged 15 commits into from
Oct 29, 2018

Conversation

lboeman
Copy link
Contributor

@lboeman lboeman commented Oct 16, 2018

pvlib python pull request guidelines

Thank you for your contribution to pvlib python! You may delete all of these instructions except for the list below.

You may submit a pull request with your code at any stage of completion.

The following items must be addressed before the code can be merged. Please don't hesitate to ask for help if you're unsure of how to accomplish any of the items below:

  • Closes issue add NREL MIDC reader to iotools #601
  • I am familiar with the contributing guidelines.
  • Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
  • Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
  • New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
  • Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):

@@ -3,3 +3,4 @@
from pvlib.iotools.srml import read_srml # noqa: F401
from pvlib.iotools.srml import read_srml_month_from_solardat # noqa: F401
from pvlib.iotools.surfrad import read_surfrad # noqa: F401
from pvlib.iotools.midc import read_midc # noqa: F401
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E261 at least two spaces before inline comment

@wholmgren wholmgren added this to the 0.6.1 milestone Oct 19, 2018
@wholmgren wholmgren added enhancement solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter io labels Oct 19, 2018
@lboeman
Copy link
Contributor Author

lboeman commented Oct 19, 2018

Revisiting providing some default mapping, originally I was concerned about mappings making assumptions for the user.
These are some of the GHI labels across different sites, with some sites providing measurements from multiple instruments.

Global PSP [W/m^2]
Global CMP22 [W/m^2]
Global LI-200 [W/m^2]
Global NRCS [W/m^2]

@wholmgren suggested that we either:

  • Map variables by their type and append their instrument/unit information to their label.
    e.g. Global PSP [W/m^2] -> ghi_PSP_[W/m^2]
    or
  • Provide a preferred field to map directly to a pvlib variable name
    e.g. Global PSP [W/m^2] -> ghi

@cwhanse do you have any thoughts on these approaches or ideas on instruments to favor?

@cwhanse
Copy link
Member

cwhanse commented Oct 19, 2018

My reaction is to favor #1, append an instrument type code to the measurement code, e.g., ghi_psp, ghi_cmp22, ghi_li200, ghi_rc (for reference cell). I don't think we want to arbitrate any debates about the merits of various instruments. Better for us to help to maintain clarity, and let the user decide which of the instruments is mapped to ghi.

@lboeman
Copy link
Contributor Author

lboeman commented Oct 19, 2018

Great, thanks for the feedback Cliff.

@wholmgren
Copy link
Member

Thanks @cwhanse. @lboeman Do we know if units are always consistent? For example, all irradiance measurements are W/m^2, all temperature measurements are deg C, etc. If so, let's drop them from the column names but put a note in the documentation.

@lboeman
Copy link
Contributor Author

lboeman commented Oct 19, 2018

@wholmgren There is at least one site that reports everything in Standard units: https://midcdmz.nrel.gov/apps/go2url.pl?site=SPMD
So I was planning to leave units in the column names of the returned dataframe.

I did find one more issue, sometimes there appear to be two names for the same variable at a site, as is the case with "Dew Point Temp [deg C]" and "Dewpoint Temp [deg C]" listed in the field api here: https://midcdmz.nrel.gov/apps/field_api.pl?NWTC
It looks like these two tags shouldn't appear in the same file as I can only find one prior to 2001/08/24 and the other since then. It might be safest to add a second, suffixed mapping to some of these to be sure that if two fields like this do appear together one isn't clobbered with the other.
Something like:
"Dew Point Temp": 'temp_dew'
"Dewpoint Temp": 'temp_dew_1'
Is that something we should be concerned with or should we map the newer values and leave the user to pass in a different mapping if they need to map older data?

@wholmgren
Copy link
Member

Too bad about the units inconsistency. I suppose that everyone that wants to use this function is going to need to inspect the sites on the MIDC page, though, so maybe it's reasonable to say check the MIDC page for units and be sure to convert before using with other pvlib functions. Either way is fine with me.

or should we map the newer values and leave the user to pass in a different mapping if they need to map older data?

this is ok with me

variable_map: dictionary
Dictionary for mapping MIDC field names to pvlib names. See variable
`VARIABLE_MAP` for default and Notes section below for a description of
its format.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W291 trailing whitespace

-----
Keys of the `variable_map` dictionary should include the first part
of a MIDC field name which indicates the variable being measured.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

of a MIDC field name which indicates the variable being measured.

e.g. 'Global PSP [W/m^2]' is entered as a key of 'Global'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

e.g. 'Global PSP [W/m^2]' is entered as a key of 'Global'

The 'PSP' indicating instrument is appended to the pvlib variable name
after mapping to differentiate measurements of the same variable. For a full
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)


The 'PSP' indicating instrument is appended to the pvlib variable name
after mapping to differentiate measurements of the same variable. For a full
list of pvlib variable names see the `Variable Style Rules <https://pvlib-python.readthedocs.io/en/latest/variables_style_rules.html>`_.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (140 > 79 characters)


References
----------
.. [1] National Renewable Energy Laboratory: Measurement and Instrumentation Data Center
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (92 > 79 characters)

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mergeable except for a typo. It would be nice if it also included a function that helped users construct a url for requesting data. From the comment in the original issue, an example url is:

url = 'http://midcdmz.nrel.gov/apps/plot.pl?site=UAT&start=20101103&edy=2&emo=4&eyr=2018&year=2016&month=1&day=1&endyear=2017&endmonth=12&endday=31&time=23:59&inst=4&inst=5&inst=6&inst=7&inst=9&type=data&first=3&math=0&second=-1&value=0.0&user=0&axis=1'

partial implementation might look like

def construct_url(site, data_start, data_end, epoch_start):
    base_url = 'http://midcdmz.nrel.gov/apps/plot.pl?'
    args = {'start': epoch_start.strftime('%Y%M%D'), 'year': data_start.strftime('%Y'), 'month': ..., ...}
    args_str = '&'.join(['{}={}'.format(k, v) for k, v in args.items()])
    return url + args_str

I've only manually constructed urls for the Tucson site, so I'm not sure about the difficulty in doing it for all sites.

Parameters
----------
variable_map: Dictionary
A dictionary for mapping MIDC field nameto pvlib name. See VARIABLE_MAP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameto -> name to

if field_name.startswith(midc_name):
# extract the instument and units field and then remove units
instrument_units = field_name[len(midc_name):]
instrument = instrument_units[:instrument_units.find('[') - 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic assumes that units are always specified in this format and will produce the wrong result otherwise. Probably a safe assumption, but maybe add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I had considered and forgot to include in implementation. Would it be preferable to handle the case where no units are provided? e.g. instrument_units.find('[') returns -1?

@lboeman
Copy link
Contributor Author

lboeman commented Oct 22, 2018

@wholmgren Building urls programatically for the plot.pl endpoint for more than one site seems problematic. Variables are requested with the inst parameter using inconsistent integer identifiers across sites, and there doesn't seem to be a simple way of requesting all variables at a given site. Users would have to use the web UI to get the 'inst' values of the variables for the site they are requesting.

We could request data from the raw data api which has a simpler url format that returns all variables. However, the raw data is not quality checked, it is just raw data from the data loggers.

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good except for a couple of minor concerns

def test_read_midc_raw_data_from_nrel():
start_ts = pd.Timestamp('20181018')
end_ts = pd.Timestamp('20181019')
midc.read_midc_raw_data_from_nrel('UAT', start_ts, end_ts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should assert a few reasonable things about the returned data


test_dir = os.path.dirname(
os.path.abspath(inspect.getfile(inspect.currentframe())))
midc_testfile = os.path.join(test_dir, '../data/midc_20181014.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should add a raw data file as well to decouple the format test from the network test

assert data.index[-1] == end


def test_midc_format_index_raw():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this also need @network?

@lboeman
Copy link
Contributor Author

lboeman commented Oct 23, 2018

I just ran into an issue with this where the reported Timezone throws an error when trying to call DataFrame.tz_localize(), particularly with PST. I'll try to find a reasonable solution to this before it gets merged.

@wholmgren
Copy link
Member

wholmgren commented Oct 23, 2018 via email

@lboeman
Copy link
Contributor Author

lboeman commented Oct 24, 2018

@wholmgren I wrote a very simple mapper function something along the lines of:

    def map_timezone(timezone):
        try:
            return tz_map[timezone]
        except KeyError:
            return timezone

Right now I'm only aware of PST and CST causing problems and I don't believe writing the ~10 lines of code to be an issue where it is needed. Perhaps if this is a more common problem it warrants a public utility function with some default mapping and maybe an iotools.util (or similar) module?

@wholmgren
Copy link
Member

The map function seems like overkill but maybe that's just me. What about...

tz_raw = # string from metadata
tz_map = {'PST': 'Etc/GMT+8', 'CST': 'Etc/GMT+6'}  # maybe put in iotools.util
timezone = tz_map.get(tz_raw, tz_raw)  # return tz_raw if not in mapping

tz_map could live inline in the function, at the module level, in a new iotools.util module, or pvlib.tools. I'm not sure what the right approach is for the long term. Maybe iotools.util, with an import into iotools? Does anyone else have an opinion?

@wholmgren
Copy link
Member

I think this is ready to merge. Any final comments/concerns?

@cwhanse
Copy link
Member

cwhanse commented Oct 29, 2018

Merge it

@wholmgren wholmgren merged commit c165e94 into pvlib:master Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants