-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Put SAM product renaming code in a separate function, simplify, add warning. #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1647,26 +1647,41 @@ def retrieve_sam(name=None, path=None): | |
return _parse_raw_sam_df(csvdata) | ||
|
||
|
||
def _normalize_sam_product_names(names): | ||
''' | ||
Replace special characters within the product names to make them more | ||
suitable for use as Dataframe column names. | ||
''' | ||
# Contributed by Anton Driesse (@adriesse), PV Performance Labs. July, 2019 | ||
|
||
import warnings | ||
|
||
BAD_CHARS = ' -.()[]:+/",' | ||
GOOD_CHARS = '____________' | ||
|
||
mapping = str.maketrans(BAD_CHARS, GOOD_CHARS) | ||
names = pd.Series(data=names) | ||
norm_names = names.str.translate(mapping) | ||
|
||
n_duplicates = names.duplicated().sum() | ||
if n_duplicates > 0: | ||
warnings.warn('Original names contain %d duplicate(s).' % n_duplicates) | ||
|
||
n_duplicates = norm_names.duplicated().sum() | ||
if n_duplicates > 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are either of these warnings triggered in the current file? I would have thought the DataFrame construction would fail if so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got 6 duplicates after normalization in CECmod, and 12 before normalization in ADRInverter. It is possible to have duplicate values in DataFrame columns or index. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can add a test if the function gets the thumbs up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do I see correctly that there is no current test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @janinefreeman FYI. I recognize that both SAM and pvlib are victims of what appears in the CEC table. Currently, pvlib imports the module parameter file from SAM. Perhaps there's a way we can help reduce some of these formatting requirements upstream in SAM, and avoid the duplicates. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cwhanse Thanks for the heads up. We get frequent requests to fix info coming from the CEC, but we are trying very hard to avoid error-checking the database. But if this method could be ported into SAM, that could be a possibility. Yet another scenario where we'd all benefit from the oft-pitched crowd-sourced component database... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @janinefreeman I think this is a case of creating code-friendly row and column labels. If you can point me to the part of SAM code that reads the CEC's spreadsheet, or that writes the SAM module database, I'm happy to take a look. |
||
warnings.warn('Normalized names contain %d duplicate(s).' % n_duplicates) | ||
|
||
return norm_names.values | ||
|
||
|
||
def _parse_raw_sam_df(csvdata): | ||
|
||
df = pd.read_csv(csvdata, index_col=0, skiprows=[1, 2]) | ||
colnames = df.columns.values.tolist() | ||
parsedcolnames = [] | ||
for cn in colnames: | ||
parsedcolnames.append(cn.replace(' ', '_')) | ||
|
||
df.columns = parsedcolnames | ||
|
||
parsedindex = [] | ||
for index in df.index: | ||
parsedindex.append(index.replace(' ', '_').replace('-', '_') | ||
.replace('.', '_').replace('(', '_') | ||
.replace(')', '_').replace('[', '_') | ||
.replace(']', '_').replace(':', '_') | ||
.replace('+', '_').replace('/', '_') | ||
.replace('"', '_').replace(',', '_')) | ||
|
||
df.index = parsedindex | ||
|
||
df.columns = df.columns.str.replace(' ', '_') | ||
df.index = _normalize_sam_product_names(df.index) | ||
df = df.transpose() | ||
|
||
if 'ADRCoefficients' in df.index: | ||
ad_ce = 'ADRCoefficients' | ||
# for each inverter, parses a string of coefficients like | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,6 +147,28 @@ def test_PVSystem_physicaliam(mocker): | |
pvsystem.physicaliam.assert_called_once_with(thetas, **module_parameters) | ||
assert iam < 1. | ||
|
||
def test__normalize_sam_product_names(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. two underscores in name, if that matters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One from the |
||
|
||
BAD_NAMES = [' -.()[]:+/",', 'Module[1]'] | ||
NORM_NAMES = ['____________', 'Module_1_'] | ||
|
||
norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES) | ||
assert list(norm_names) == NORM_NAMES | ||
|
||
BAD_NAMES = ['Module[1]', 'Module(1)'] | ||
NORM_NAMES = ['Module_1_', 'Module_1_'] | ||
|
||
with pytest.warns(UserWarning): | ||
norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES) | ||
assert list(norm_names) == NORM_NAMES | ||
|
||
BAD_NAMES = ['Module[1]', 'Module[1]'] | ||
NORM_NAMES = ['Module_1_', 'Module_1_'] | ||
|
||
with pytest.warns(UserWarning): | ||
norm_names = pvsystem._normalize_sam_product_names(BAD_NAMES) | ||
assert list(norm_names) == NORM_NAMES | ||
|
||
|
||
# if this completes successfully we'll be able to do more tests below. | ||
@pytest.fixture(scope="session") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! this is new to me