Skip to content

Put SAM product renaming code in a separate function, simplify, add warning. #753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 32 additions & 17 deletions pvlib/pvsystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1647,26 +1647,41 @@ def retrieve_sam(name=None, path=None):
return _parse_raw_sam_df(csvdata)


def _normalize_sam_product_names(names):
'''
Replace special characters within the product names to make them more
suitable for use as Dataframe column names.
'''
# Contributed by Anton Driesse (@adriesse), PV Performance Labs. July, 2019

import warnings

BAD_CHARS = ' -.()[]:+/",'
GOOD_CHARS = '____________'

mapping = str.maketrans(BAD_CHARS, GOOD_CHARS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! this is new to me

names = pd.Series(data=names)
norm_names = names.str.translate(mapping)

n_duplicates = names.duplicated().sum()
if n_duplicates > 0:
warnings.warn('Original names contain %d duplicate(s).' % n_duplicates)

n_duplicates = norm_names.duplicated().sum()
if n_duplicates > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are either of these warnings triggered in the current file? I would have thought the DataFrame construction would fail if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got 6 duplicates after normalization in CECmod, and 12 before normalization in ADRInverter. It is possible to have duplicate values in DataFrame columns or index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a test if the function gets the thumbs up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I see correctly that there is no current test for retrieve_sam()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janinefreeman FYI. I recognize that both SAM and pvlib are victims of what appears in the CEC table. Currently, pvlib imports the module parameter file from SAM. Perhaps there's a way we can help reduce some of these formatting requirements upstream in SAM, and avoid the duplicates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse Thanks for the heads up. We get frequent requests to fix info coming from the CEC, but we are trying very hard to avoid error-checking the database. But if this method could be ported into SAM, that could be a possibility. Yet another scenario where we'd all benefit from the oft-pitched crowd-sourced component database...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janinefreeman I think this is a case of creating code-friendly row and column labels. If you can point me to the part of SAM code that reads the CEC's spreadsheet, or that writes the SAM module database, I'm happy to take a look.

warnings.warn('Normalized names contain %d duplicate(s).' % n_duplicates)

return norm_names.values


def _parse_raw_sam_df(csvdata):

df = pd.read_csv(csvdata, index_col=0, skiprows=[1, 2])
colnames = df.columns.values.tolist()
parsedcolnames = []
for cn in colnames:
parsedcolnames.append(cn.replace(' ', '_'))

df.columns = parsedcolnames

parsedindex = []
for index in df.index:
parsedindex.append(index.replace(' ', '_').replace('-', '_')
.replace('.', '_').replace('(', '_')
.replace(')', '_').replace('[', '_')
.replace(']', '_').replace(':', '_')
.replace('+', '_').replace('/', '_')
.replace('"', '_').replace(',', '_'))

df.index = parsedindex

df.columns = df.columns.str.replace(' ', '_')
df.index = _normalize_sam_product_names(df.index)
df = df.transpose()

if 'ADRCoefficients' in df.index:
ad_ce = 'ADRCoefficients'
# for each inverter, parses a string of coefficients like
Expand Down
22 changes: 22 additions & 0 deletions pvlib/test/test_pvsystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,28 @@ def test_PVSystem_physicaliam(mocker):
pvsystem.physicaliam.assert_called_once_with(thetas, **module_parameters)
assert iam < 1.

def test__normalize_sam_product_names():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two underscores in name, if that matters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One from the test_ prefix and one from the function name.


BAD_NAMES = [' -.()[]:+/",', 'Module[1]']
NORM_NAMES = ['____________', 'Module_1_']

norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES)
assert list(norm_names) == NORM_NAMES

BAD_NAMES = ['Module[1]', 'Module(1)']
NORM_NAMES = ['Module_1_', 'Module_1_']

with pytest.warns(UserWarning):
norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES)
assert list(norm_names) == NORM_NAMES

BAD_NAMES = ['Module[1]', 'Module[1]']
NORM_NAMES = ['Module_1_', 'Module_1_']

with pytest.warns(UserWarning):
norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES)
assert list(norm_names) == NORM_NAMES


# if this completes successfully we'll be able to do more tests below.
@pytest.fixture(scope="session")
Expand Down