Skip to content

add test for hour_angle, vectorize #599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 5, 2018
Merged

Conversation

mikofski
Copy link
Member

@mikofski mikofski commented Oct 5, 2018

Signed-off-by: Mark Mikofski [email protected]

pvlib python pull request guidelines

Thank you for your contribution to pvlib python! You may delete all of these instructions except for the list below.

You may submit a pull request with your code at any stage of completion.

The following items must be addressed before the code can be merged. Please don't hesitate to ask for help if you're unsure of how to accomplish any of the items below:

  • Closes issue #xxxx
  • I am familiar with the contributing guidelines.
  • Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
  • Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
  • New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
  • Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):

* closes pvlib#598 vectorize to make it more efficient
* closes pvlib#597 add test

Signed-off-by: Mark Mikofski <[email protected]>
eot = np.array([-3.935172, -4.117227, -4.026295])
hours = solarposition.hour_angle(times, longitude, eot)
expected = (-70.682338, 70.72118825000001, 0.000801250)
#expected = (-70.699400,70.512721, 0.0) # can't quite get these, why?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E265 block comment should start with '# '

* also converting times to int before subtracting works better for older
pandas versions which were not calculating the timedeltas correctly
* remove comment with missing space after hash, add FIXME that explains
why the expected values are slightly different than the SPA calculator
output
tz_info = times.tz
timezone = tz_info.utcoffset(times).total_seconds() / 3600.
hours = (times.astype(np.int64) - times.normalize().astype(np.int64)) / (
3600. * 1.e9)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E126 continuation line over-indented for hanging indent

)).total_seconds() / 3600. for t in times])
timezone = times.tz.utcoffset(times).total_seconds() / 3600.
tz_info = times.tz
timezone = tz_info.utcoffset(times).total_seconds() / 3600.
Copy link
Member

@wholmgren wholmgren Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the expected behavior?

In [15]: times = pd.DatetimeIndex(start='20180101', end='20190101', freq='1s', tz='America/Phoenix')

In [16]: timezone = times.tz.utcoffset(times).total_seconds() / 3600.

In [17]: timezone
Out[17]: -7.466666666666667

I am surprised because 1. I expected -7 and 2. times is a vector so I expected to get a result for each time (offset could change for DST).

Is this preferable?

In [27]: np.array((times.tz_localize(None).astype(int) - times.astype(int)) / 1e9 / 3600)
Out[27]: array([-7., -7., -7., ..., -7., -7., -7.])

The np.array wrapper ensures a known type instead of a version-dependent Index/array type.

Note that tz_localize(None) will always work if we bump min pandas to 0.15, as discussed elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not what I would expect, I took that approach from this SO answer.

TL;DR: let's just use your approach instead

details
Something very weird is going on with this particular America/Phoenix timezone. PyTz is picking a historical timezone called LMT that was discontinued in 1883 because it is listed first in the transition info list.

>>> import pytz
>>> import datetime
>>> america_phoenix = pytz.timezone('America/Phoenix')
<DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>
>>> dt = datetime.datetime.now().replace(tzinfo=america_phoenix)
datetime.datetime(2018, 10, 5, 12, 58, 34, 427000,
                  tzinfo=<DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>)
>>> america_phoenix.utcoffset(dt)
datetime.timedelta(-1, 59520)

The source for DstTzInfo says that _utcoffset is returned for utcoffset(dt) if localized already, otherwise localize first then return the _utcoffset of the timezone.

>>> america_phoenix._utcoffset
datetime.timedelta(-1, 59520)

And if not localized, then _utcoffset is set to whatever is the first tz info in the transition info table. In this case that's "LMT" which was UTC -7:28:18 until 1883.

>>> america_phoenix._tzinfos
{(datetime.timedelta(-1, 59520),
  datetime.timedelta(0),
  'LMT'): <DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>,
 (datetime.timedelta(-1, 61200),
  datetime.timedelta(0),
  'MST'): <DstTzInfo 'America/Phoenix' MST-1 day, 17:00:00 STD>,
 (datetime.timedelta(-1, 64800),
  datetime.timedelta(0, 3600),
  'MDT'): <DstTzInfo 'America/Phoenix' MDT-1 day, 18:00:00 DST>,
 (datetime.timedelta(-1, 64800),
  datetime.timedelta(0, 3600),
  'MWT'): <DstTzInfo 'America/Phoenix' MWT-1 day, 18:00:00 DST>}

>>> america_phoenix._transition_info
[(datetime.timedelta(-1, 59520), datetime.timedelta(0), 'LMT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MWT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MWT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST')]

If the datetime is localized, then PyTz does some smart stuff to figure out which transition info to use, based on the values in _utc_transition_times and returns the corresponding transition info and timezone info, and replaces tzinfo in the datetime.

>>> america_phoenix._utc_transition_times
[datetime.datetime(1, 1, 1, 0, 0),
 datetime.datetime(1901, 12, 13, 20, 45, 52),
 datetime.datetime(1918, 3, 31, 9, 0),
 datetime.datetime(1918, 10, 27, 8, 0),
 datetime.datetime(1919, 3, 30, 9, 0),
 datetime.datetime(1919, 10, 26, 8, 0),
 datetime.datetime(1942, 2, 9, 9, 0),
 datetime.datetime(1944, 1, 1, 6, 1),
 datetime.datetime(1944, 4, 1, 7, 1),
 datetime.datetime(1944, 10, 1, 6, 1),
 datetime.datetime(1967, 4, 30, 9, 0),
 datetime.datetime(1967, 10, 29, 8, 0)]

Since our datetime is 2018-10-05T12:58:34, that's the last index, and we get the the most recent Phoenix timezone from the transition info table.

(datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST')

which is the key using in the timezone info table and returns the correct timezone

(datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'):
    <DstTzInfo 'America/Phoenix' MST-1 day, 17:00:00 STD>

When pandas makes a datetime index, I guess it doesn't care about tzinfo but it does do this with Timestamps which are the items in the datetime index.

DatetimeIndex has wrong timezone

>>> import pandas as pd
>>> times = pd.DatetimeIndex(start='20180101', end='20190101', freq='1s', tz='US/Arizona')
>>> times.tz
<DstTzInfo 'US/Arizona' LMT-1 day, 16:32:00 STD>
>>> times.tzinfo  # Alias for tz attribute (in PyTz)
<DstTzInfo 'US/Arizona' LMT-1 day, 16:32:00 STD>

Timestamp has correct timezone

>>> times[0]
Timestamp('2018-01-01 00:00:00-0700', tz='US/Arizona', offset='S')
>>> times[0].tz
<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>
>>> times[0].tzinfo
<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>

Also if to_pydatetime() is called it uses the correct timezone as well.

>>> times.to_pydatetime()
array([ datetime.datetime(2018, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 1, 1, 0, 0, 1, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 1, 1, 0, 0, 2, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       ...,
       datetime.datetime(2018, 12, 31, 23, 59, 58, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 12, 31, 23, 59, 59, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2019, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>)], dtype=object)

Anyway, not sure if this is a bug in pandas or not, but who cares? Now I have to go around and fix this everywhere I've used utcoffset(), bummer. ☹️

Copy link
Member Author

@mikofski mikofski Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: it's only used 2 places:

Searching 160 files for "utcoffset"

C:\Users\mikm\Projects\pvlib-python\pvlib\clearsky.py:
  664          * automatically determines sample_interval
  665          * requires a reference clear sky series instead calculating one
  666:           from a user supplied location and UTCoffset
  667          * parameters are controllable via keyword arguments
  668          * option to return individual test components and clearsky scaling

C:\Users\mikm\Projects\pvlib-python\pvlib\solarposition.py:
  547          # pyephem drops timezone when converting to its internal datetime
  548          # format, so handle timezone explicitly here
  549:         obs.date = ephem.Date(thetime - thetime.utcoffset())
  550          sunrise.append(_ephem_to_timezone(rising(sun), time.tz))
  551          sunset.append(_ephem_to_timezone(setting(sun), time.tz))
  ...
 1329      """
 1330      tz_info = times.tz
 1331:     timezone = tz_info.utcoffset(times).total_seconds() / 3600.
 1332      hours = 1 / (3600. * 1.e9) * (
 1333          times.astype(np.int64) - times.normalize().astype(np.int64))

So utcoffset() is also a Python builtin which is overloaded by PyTz. The difference is that if utcoffset is called on a Python datetime then it will return the utcoffset of it's tz info called on itself. So in our example above we could have called:

>>> dt.utcoffset()
datetime.timedelta(-1, 59520)  # wrong timezone b/c localize() was not used!

Also FYI: there is an inefficient loop in rise_set_transit_ephem which uses slow to_pydatetime() and utcoffset(), but it's probably addressed in #588?

Other than that I think there's only 2 other instances of utcoffset this one in hour_angle and one in #583. I will fix both. Thanks for catching this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this is also an example of why we should never replace tzinfo manually, but always use the localize() method so that the correct timezone from the transition info table is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it. I've seen this kind of funny behavior with the AZ timezones before but it didn't occur to me that it could be a problem in this function. Some other common timezones have similar behavior (though I don't remember which).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't think it's a problem with the function utcoffset(). Honestly, I think this was my mistake for using utcoffset() with a sequence of times like pandas.DatetimeIndex, because there could potentially be a different timezone for each item in the sequence, so it's ambiguous which timezone should be used. Therefore, I think the problem is that the tz (aka tzinfo) that's set in the DatetimeIndex is unreliable and ambiguous, or the problem could be with utcoffset which should be overloaded to return a list of timezones localized to each item in the sequence.

@wholmgren wholmgren added this to the 0.6.1 milestone Oct 5, 2018
@wholmgren wholmgren added the bug label Oct 5, 2018
* ... suggested by @wholmgren (thx!)
* utcoffset() is unpredictable when used with pandas datetime indices
* only predictable with Python datetime objects or pandas Timestamps
* instead replace tzinfo with None to get naive local times, and
calculate difference from tz-aware times to get timezones
* also use asarray wrapper at return to ensure consistency
* add comments to explain to future maintainers
# hours - timezone = times - times.normalized - (naive_times - times)
hrs_minus_tzs = 1 / (3600. * 1.e9) * (
2 * times.astype(np.int64) - times.normalize().astype(np.int64)
- naive_times.astype(np.int64))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W503 line break before binary operator

@wholmgren
Copy link
Member

@mikofski I suggest that you take the upgrade to pandas 0.15 commits from #595 and drop your try/except block. You'll also need to make a minor change to pvsystem.py described here.

@wholmgren
Copy link
Member

or maybe we just leave master in a failing state until a few PRs go through.

@mikofski
Copy link
Member Author

mikofski commented Oct 5, 2018

okay, I removed the try-except block , which ha ha didn't work anyway, and we can just wait for pandas min-requirements to be updated, thanks!

@wholmgren
Copy link
Member

@mikofski thanks! I will go ahead and merge to get the PR logjam moving, but please update the whatsnew in #583 to include these commits.

@wholmgren wholmgren merged commit 5ee82b1 into pvlib:master Oct 5, 2018
@mikofski mikofski deleted the fix_hour_angles branch October 9, 2018 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants