Skip to content

specify user-agent in tmy3 remote request #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sphinx/source/whatsnew/v0.6.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Bug fixes
(:issue:`464`)
* ModelChain.prepare_inputs failed to pass solar_position and airmass to
Location.get_clearsky. Fixed. (:issue:`481`)
* Add User-Agent specification to TMY3 remote requests to avoid rejection.
(:issue:`493`)


Documentation
Expand Down
31 changes: 22 additions & 9 deletions pvlib/tmy.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
import dateutil
import io
try:
from urllib2 import urlopen
from urllib2 import urlopen, Request
except ImportError:
from urllib.request import urlopen
from urllib.request import urlopen, Request

import pandas as pd

Expand Down Expand Up @@ -164,14 +164,23 @@ def readtmy3(filename=None, coerce_year=None, recolumn=True):

head = ['USAF', 'Name', 'State', 'TZ', 'latitude', 'longitude', 'altitude']

try:
csvdata = open(filename, 'r')
except IOError:
response = urlopen(filename)
if filename.startswith('http'):
request = Request(filename, headers={'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does any user agent work here, or does it need to match the user's browser and platform? If the latter then we need to test this fix on multiple OS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No and no.

By default, Python's urllib specifies a user-agent that is "Python-urllib/x.y" where x.y is the version. https://docs.python.org/3/howto/urllib2.html This user agent is no longer allowed. I am guessing most user agents that identify themselves as scripts are now blocked.

The automated tests run on linux and windows (not mac), and they pass on this PR (note that they currently fail on the master). I don't think that the server has any way of knowing that our script is pretending to be a browser unless it does something smarter (e.g. sees that the requests are coming from IPs associated with travis-ci and appveyor).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Satisfies me.

'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 '
'Safari/537.36'})
response = urlopen(request)
csvdata = io.StringIO(response.read().decode(errors='ignore'))
else:
# assume it's accessible via the file system
csvdata = open(filename, 'r')

# read in file metadata, advance buffer to second line
firstline = csvdata.readline()
if 'Request Rejected' in firstline:
raise IOError('Remote server rejected TMY file request')

# read in file metadata
meta = dict(zip(head, csvdata.readline().rstrip('\n').split(",")))
meta = dict(zip(head, firstline.rstrip('\n').split(",")))

# convert metadata strings to numeric types
meta['altitude'] = float(meta['altitude'])
Expand All @@ -180,8 +189,12 @@ def readtmy3(filename=None, coerce_year=None, recolumn=True):
meta['TZ'] = float(meta['TZ'])
meta['USAF'] = int(meta['USAF'])

# use pandas to read the csv file/stringio buffer
# header is actually the second line in file, but tell pandas to look for
# header information on the 1st line (0 indexing) because we've already
# advanced past the true first line with the readline call above.
data = pd.read_csv(
filename, header=1,
csvdata, header=0,
parse_dates={'datetime': ['Date (MM/DD/YYYY)', 'Time (HH:MM)']},
date_parser=lambda *x: _parsedate(*x, year=coerce_year),
index_col='datetime')
Expand Down