-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add functionality for reading EPW weather files #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds EPW reader to iotools.tmy and includes files for testing
@roelloonen thanks, and welcome! @wholmgren should we put this function in a new |
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the good pull request.
pvlib/iotools/tmy.py
Outdated
|
||
if filename is None: | ||
try: | ||
filename = _interactive_load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone use the interactive load for tmy files? I'd rather deprecate that feature in tmy than continue it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anticipating that the answer will be no, I have omitted this option in the new read_epw
function
pvlib/iotools/tmy.py
Outdated
|
||
# Shift one hour back because EPW's usage of hour 24 | ||
# and dateutil's inability to handle that. | ||
data["hour"] = data["hour"].apply(lambda x: x - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The epw function in this comment did not apply a time shift. Which is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. EPW data files use a similar 1-24h formulation as TMY3, so I mistakenly assumed this shift would be necessary. Apparently, pd.to_datetime
was already handling things just fine, but in this update I had to add a correction to the last line to avoid violation of all dates belonging to the same year. There must be a prettier solution for this, but it seems to work for the time being.
includes a couple of other small fixes and updated testing files
Commit 35c7951 creates a new |
pvlib/iotools/epw.py
Outdated
'hour']])) | ||
|
||
# Localize time series | ||
data = data.tz_localize(int(meta['TZ'] * 3600)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wholmgren I know this line is copied from iotools.tmy
but how does it work? I see that it does, but as far as I can tell, it's undocumented behavior of pandas.DataFrame.tz_localize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how, but we have a bit of context here: https://pvlib-python.readthedocs.io/en/latest/timetimezones.html#fixed-offsets
data.tz_localize(pytz.FixedOffset(int(meta['TZ'] * 60)))
would be better. I have a vague recollection that it did not work on an earlier version of pandas, but probably ok now. Something like you could specify Timestamp(tz=...)
but could not simply pass it to DataFrame.tz_localize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a helper function to translate from timezone offset as a float to a pytz
object. Suggest that we accept it in this PR, and open a new issue to improve here, in iotools.tmy
and in readthedocs.
Ok with me
…On Fri, Mar 22, 2019 at 9:36 AM Cliff Hansen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pvlib/iotools/epw.py
<#677 (comment)>:
> +
+ # We only have to skip 6 rows instead of 7 because we have already used
+ # the realine call above.
+ data = pd.read_csv(csvdata, skiprows=6, header=0, names=colnames)
+
+ # Change to single year if requested
+ if coerce_year is not None:
+ data["year"] = coerce_year
+ data['year'].iloc[-1] = coerce_year - 1
+
+ # Update index with correct date information
+ data = data.set_index(pd.to_datetime(data[['year', 'month', 'day',
+ 'hour']]))
+
+ # Localize time series
+ data = data.tz_localize(int(meta['TZ'] * 3600))
Maybe a helper function to translate from timezone offset as a float to a
pytz object. Suggest that we accept it in this PR, and open a new issue
to improve here, in iotools.tmy and in readthedocs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#677 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AELiR3BgA6RQmrHx-1SqTD-47Ra_JztAks5vZQaTgaJpZM4cAfhg>
.
|
Remove line that was only meant for debugging
pvlib/iotools/epw.py
Outdated
data['year'].iloc[-1] = coerce_year - 1 | ||
|
||
# Update index with correct date information | ||
data = data.set_index(pd.to_datetime(data[['year', 'month', 'day', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line causes problems on the python 2.7-min configuration. Have you checked that the index is correct in your debugging? It's not explicitly tested in any of the tests. https://travis-ci.org/pvlib/pvlib-python/jobs/510118505#L914
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is a silly question, but I think I need a bit more explanation.
What would 'correctness' of the index mean?
Do you have any example of how this can be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only meant that the index is what you expected it to be. Here's a more formal example for the index expectation for the read_solrad test function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest creating the index separately, and localizing it, then setting the dataframe index.
E.g.,
idx = pd.DatetimeIndex(pd.to_datetime(data[['year', 'month', 'day', 'hour']])
idx.tz_localize(xxx)
data.index = idx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to follow Cliff's suggestion for separately creating the index first. This seems to work fine.
But I am getting pretty much stuck with the error in the python 2.7-min configuration.
I have verified that the index gets created, and that it makes sense. The zero padding solution is not helping.
I think that the issue is that the index comes in the form of a series, which to_datetime
does not like, apparently.
I have tried to solutions proposed here, but it is not working for me.
Any ideas or suggestions would be very welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is related to the time specification using hour = 24. I checked out your code and got it to work like this:
# create index that supplies correct date and time zone information
dts = data[['month', 'day']].astype(str).apply(lambda x: x.str.zfill(2))
hrs = (data['hour'] - 1).astype(str).str.zfill(2)
dtscat = data['year'].astype(str) + dts['month'] + dts['day'] + hrs
idx = pd.to_datetime(dtscat, format='%Y%m%d%H')
idx = idx.dt.tz_localize(int(meta['TZ'] * 3600))
data.index = idx
Consider adding a +1 hour timedelta to the index. In any case, we'll need to make sure the docstring correctly describes the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roelloonen , @cwhanse, @janinefreeman, @wholmgren : Hi all- I'm curious about the -1 hour offset in read_epw() in the current pvlib release (line 171). I think that this is a mistake - when I try to match modeled dni using ghi and dhi, I can only do this when I calculate solar position at (timestamp + 30 minutes), which is different from the TMY convention of calculating solar position at (timestamp - 30 minutes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdeline The shift is because EPW files follow an hour-ending timestamp convention, and thus hours take values from 1 to 24. Hour values are shifted to hour-beginning to avoid changing the date for the hour=24 timestamp in the pandas index.
The docstring for read_epw
describes hour-ending convention for all variables, which needs to be corrected since we've shifted to hour beginning.
I'm confused about the line number you cite: I see that shift at line 165 of read_epw
- is it numbered 171 in your IDE, or are we actually looking at different code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cliff- I was referring to V0.6.3 which has an additional bit of importing at the top:
try:
# python 2 compatibility
from urllib2 import urlopen, Request
except ImportError:
from urllib.request import urlopen, Request
Otherwise, I think it's identical to this 0.6.1 version here. Thanks for your discussion above.
This commit forces the data format and now pads a zero in front of the 'hour' column. I believe that this solves the Python 2.7 issue, but Travis testing needs to confirm this
Can you run Edited to add that you can ignore the too long lines in the docstring. Not worth the effort for those.
|
Updates code for index creation in epw.py Improves docstring for clarity Cleans trailing spaces, missing whitespace, etc. identified by pycodestyle in epw.py and test_epw.py
Last hour of the year led to an error in the forced year check. It was because of a remnant of my quick-and-dirty approach which is no longer necessary using Will's solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more minor change. Can you also update the whatsnew file with a note about the addition and your name and/or username?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject to making stickler happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @cwhanse the long lines in the documentation table are tricky. Sphinx simple table markup (like we're using here) doesn't allow for cells to continue to the next line. The more complicated table markup is a major pain. I recommend merging as is.
Stickler overruled. I restarted the failed travis build for py36. |
thanks @roelloonen! |
Hi guys, thank you for the work on this - I was about to get started on a pull request from my fork, but you beat me to it. :) One consideration: when comparing the header names for EPW and TMY3 file imports, is it useful to maintain consistency? Here are some examples of differences. Since EPW files don't contain headers, couldn't we at least adopt something like the existing TMY2 / TMY3 for consistency? Or have these EPW header names been chosen to match some other standard?
|
Crap, I'm 3 minutes too late. LOL |
@cdeline my thought was "crap I was 3 minutes too early"!
We are following the (incomplete) standards documented here: https://pvlib-python.readthedocs.io/en/latest/variables_style_rules.html |
OK. Is there a translator or option to have the TMY3 and TMY2 files read out into this consistent format? It would be nice to design my code around this to be functional whether I import a TMY2 / TMY3 / EPW file. Like another _recolumn function in tmy.py if you guys are moving to a different IO header convention? |
There is not currently a translator or option but I'd support creating one. Unlikely to do it myself, though. These header conventions have been around for a long time, but the TMY readers have been around even longer and we never updated the old functions. |
@cdeline are you asking if there is a 'standard' format to which to write TMY2/3/EPW data from pvlib? |
Yeah, I guess that was my question - if you're moving to a new pvlib header convention for the .epw reader, it might be useful to have a way to cast the legacy TMY2 and TMY3 readers into the same header format by passing in a 'newheader' boolean or something. I suppose it's no big deal for me to just rename the dictionary keys of the new EPW output. I was just figuring that you would want all of your different readers in the .iotools module to have a consistent output instead of one returning 'temp_air' and another returning 'DryBulb' and whatnot. |
Adds an EPW reader to
iotools.tmy
and includes files for testing.Most of the code is adapted from the original TMY3 functions in
iotools.tmy
.Tested locally with both downloaded EPW files and web links.
Tried to edit and verify
test_tmy.py
. This seems to work fine but needs confirmation.In general, I am fairly new to Python and a total novice in the use of GitHub. Tips and directions are therefore very much welcome.
docs/sphinx/source/api.rst
for API changes.docs/sphinx/source/whatsnew
file for all changes.Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):
Adds EPW reader to
iotools.tmy
similar to TMY2 and TMY3. Both local files and web links are supported.