-
Notifications
You must be signed in to change notification settings - Fork 16
Sensorization #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dv-package
Are you sure you want to change the base?
Sensorization #568
Conversation
Sorry, I meant to merge this into the |
Tagging @krivard |
Is this still under consideration, or are we backing off of sensorization entirely? |
We are trying to tweak the method of both the sensorization map and the inverse map, but I don't think we're backing off entirely. |
okay -- maybe convert to draft while you're tweaking, then it'll be easier to know when someone should review it |
DATE_COL = "ServiceDate" #"servicedate" | ||
GEO_COL = "PatCountyFIPS" #"patCountyFIPS" | ||
AGE_COL = "PatAgeGroup" #"patAgeGroup" | ||
HRR_COLS = ["Pat HRR Name", "Pat HRR ID"]#["patHRRname", "patHRRid"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these comments for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, HSP changed the column names of the files they were sending us. I was testing the code on a recent drop (with lowercase column names). The code originally had the uppercase names, so I changed them back to pass the tests and to merge the PR.
@@ -140,7 +147,7 @@ def update_sensor( | |||
params = Weekday.get_params(data) if weekday else None | |||
|
|||
# handle explicitly if we need to use Jeffreys estimate for binomial proportions | |||
jeffreys = True if se else False | |||
jeffreys = se |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this var needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's used in sensor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i meant if it's going to be jeffreys = se, why not just use the
sevariable and not create/assign it to the
jeffreys` varrable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. I'll change it
From Ryan:
To save memory, Ryan suggests we store the indicator values for the non-sensorized signal (indexed by time, location, and issue) and the location-specific and time-invariant coefficients (indexed by location and issue) on the server, instead of storing the non-sensorized signal as well as the sensorized signal. This is not currently supported, as I output the sensorized signal just like any other signal. Thoughts on whether this is worth implementing on the API side and on the indicator side? |
fips_pop = pd.read_csv("%s/fips_pop.csv" % (geo_folder),\ | ||
dtype={"fips":str,"pop":int}) | ||
if geo.lower() == "state": | ||
fips_state = pd.read_csv("%s/fips_state_table.csv" % (geo_folder),\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason geomapper util isnt used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Just to make sure I understand -- the sensorized signal will be in addition to the existing indicator, or will it replace it? I understand it may be less confusing on the map to only show the sensorized version, but we'd still like the API to provide the original indicator. |
This approach sounds similar to/compatible with cmu-delphi/delphi-epidata#239. In general I'm in favor of such an effort, but it's a much bigger lift than we can complete by the end of the month. Probably doable as a Q1 goal though. Who else should weigh in? |
geomapper = GeoMapper() | ||
fips_pop = pd.DataFrame({"fips":sorted(list(geomapper.get_geo_values("fips")))}) | ||
fips_pop = fips_pop[fips_pop.fips.str.slice(start=2) != "000"] | ||
fips_pop = geomapper.add_population_column(fips_pop,"fips",geocode_col="fips") | ||
if geo.lower() == "state": | ||
geo_weights = geomapper.replace_geocode( | ||
fips_pop,"fips","state_id",from_col="fips",new_col="geo",date_col=None | ||
) | ||
elif geo.lower() == "hrr": | ||
geo_weights = geomapper.replace_geocode( | ||
fips_pop,"fips","hrr",from_col="fips",new_col="geo",date_col=None | ||
) | ||
elif geo.lower() == "msa": | ||
geo_weights = geomapper.replace_geocode( | ||
fips_pop,"fips","msa",from_col="fips",new_col="geo",date_col=None | ||
) | ||
elif geo.lower() == "county": | ||
geo_weights = fips_pop.rename(columns={"fips":"geo"}) | ||
geo_weights = geo_weights.rename(columns={"population":"weight"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like something that could eventually live in the geomapper util to get populations for everything, but for now would recommend moving this into its own function that returns the weight DF.
|
||
|
||
@staticmethod | ||
def sensorize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot going on here -- I'd split it up into a few different methods that get called depending on the various conditions
Performs sensorization for DV signal.
For each time_value t:Create a single linear mapping (f) from a global case rate to global DV, fit globally using the previous [j, k] daysFor each location i:Create linear mapping (g_i) from DV to cases, fit using just location i and the previous [j, k] daysAt time t and for location i, report f(g_i(DV(t))). In other words, pass DV(t) → g_i → fEDIT: After Addison's DAP, Ryan has decided that it would be better to perform sensorization statically, not using a sliding window for training. So the algorithm has been modified to: