Sensorization #568

rumackaaron · 2020-11-20T18:48:33Z

Performs sensorization for DV signal.

~~For each time_value t:~~
- ~~Create a single linear mapping (f) from a global case rate to global DV, fit globally using the previous [j, k] days~~
- ~~For each location i:~~
  - ~~Create linear mapping (g_i) from DV to cases, fit using just location i and the previous [j, k] days~~
  - ~~At time t and for location i, report f(g_i(DV(t))). In other words, pass DV(t) → g_i → f~~

EDIT: After Addison's DAP, Ryan has decided that it would be better to perform sensorization statically, not using a sliding window for training. So the algorithm has been modified to:

Create a single linear mapping (f) from a global case rate to global DV, fit using the entire history
For each location i:
- Create linear mapping (g_i) from DV to cases, fit using just location i, using the entire history
- At time t and for location i, report f(g_i(DV(t))). In other words, pass DV(t) → g_i → f

rumackaaron · 2020-11-20T18:51:11Z

Sorry, I meant to merge this into the dv-package branch. Is that still possible or do I need to create a new PR?

rumackaaron · 2020-11-20T19:15:22Z

Tagging @krivard

krivard · 2020-11-30T22:18:46Z

Is this still under consideration, or are we backing off of sensorization entirely?

rumackaaron · 2020-12-01T22:55:39Z

We are trying to tweak the method of both the sensorization map and the inverse map, but I don't think we're backing off entirely.

krivard · 2020-12-02T21:40:16Z

okay -- maybe convert to draft while you're tweaking, then it'll be easier to know when someone should review it

chinandrew · 2021-01-12T16:08:05Z

doctor_visits/delphi_doctor_visits/config.py

+    DATE_COL = "ServiceDate" #"servicedate"
+    GEO_COL = "PatCountyFIPS" #"patCountyFIPS"
+    AGE_COL = "PatAgeGroup" #"patAgeGroup"
+    HRR_COLS = ["Pat HRR Name", "Pat HRR ID"]#["patHRRname", "patHRRid"]


what are these comments for?

At some point, HSP changed the column names of the files they were sending us. I was testing the code on a recent drop (with lowercase column names). The code originally had the uppercase names, so I changed them back to pass the tests and to merge the PR.

chinandrew · 2021-01-12T16:12:30Z

doctor_visits/delphi_doctor_visits/update_sensor.py

@@ -140,7 +147,7 @@ def update_sensor(
    params = Weekday.get_params(data) if weekday else None

    # handle explicitly if we need to use Jeffreys estimate for binomial proportions
-    jeffreys = True if se else False
+    jeffreys = se


if this var needed?

Yes, it's used in sensor.py

oh i meant if it's going to be jeffreys = se, why not just use the sevariable and not create/assign it to thejeffreys` varrable?

Yeah that's fair. I'll change it

rumackaaron · 2021-01-12T16:15:08Z

From Ryan:

Update: my current view is that the best way to sensorize at time t is to use all available history for the sensor regression model. Then use this fitted model to define a new sensor at time t, and also redefine the sensor at all times 1,...,t-1. This will give us the best of both worlds: improved comparisons across geos, and stable comparisons across time. It can be accomplished using the "as of" parameter in the COVIDcast API, as I explain in my comment here. See also Addison's DAP.

Suggestion: Aaron first implements this "offline" to make sure that it truly works as expected. Then Aaron + engineering work to make this possible when stored a sensor in the API, through the "as of" parameter. Doing this efficiently, without using up a ton of memory going forward, requires us to do some dynamic but simple calculations on the API side.

To save memory, Ryan suggests we store the indicator values for the non-sensorized signal (indexed by time, location, and issue) and the location-specific and time-invariant coefficients (indexed by location and issue) on the server, instead of storing the non-sensorized signal as well as the sensorized signal.

This is not currently supported, as I output the sensorized signal just like any other signal. Thoughts on whether this is worth implementing on the API side and on the indicator side?

chinandrew · 2021-01-12T16:22:34Z

doctor_visits/delphi_doctor_visits/update_sensor.py

+        fips_pop = pd.read_csv("%s/fips_pop.csv" % (geo_folder),\
+            dtype={"fips":str,"pop":int})
+        if geo.lower() == "state":
+            fips_state = pd.read_csv("%s/fips_state_table.csv" % (geo_folder),\


any reason geomapper util isnt used here?

mariajahja · 2021-01-12T17:23:41Z

This is not currently supported, as I output the sensorized signal just like any other signal. Thoughts on whether this is worth implementing on the API side and on the indicator side?

Just to make sure I understand -- the sensorized signal will be in addition to the existing indicator, or will it replace it? I understand it may be less confusing on the map to only show the sensorized version, but we'd still like the API to provide the original indicator.

krivard · 2021-01-12T18:02:56Z

Thoughts on whether this is worth implementing on the API side and on the indicator side?

This approach sounds similar to/compatible with cmu-delphi/delphi-epidata#239. In general I'm in favor of such an effort, but it's a much bigger lift than we can complete by the end of the month. Probably doable as a Q1 goal though. Who else should weigh in?

_delphi_utils_python/.pylintrc

chinandrew · 2021-01-14T15:24:05Z

doctor_visits/delphi_doctor_visits/update_sensor.py

+        geomapper = GeoMapper()
+        fips_pop = pd.DataFrame({"fips":sorted(list(geomapper.get_geo_values("fips")))})
+        fips_pop = fips_pop[fips_pop.fips.str.slice(start=2) != "000"]
+        fips_pop = geomapper.add_population_column(fips_pop,"fips",geocode_col="fips")
+        if geo.lower() == "state":
+            geo_weights = geomapper.replace_geocode(
+                fips_pop,"fips","state_id",from_col="fips",new_col="geo",date_col=None
+            )
+        elif geo.lower() == "hrr":
+            geo_weights = geomapper.replace_geocode(
+                fips_pop,"fips","hrr",from_col="fips",new_col="geo",date_col=None
+            )
+        elif geo.lower() == "msa":
+            geo_weights = geomapper.replace_geocode(
+                fips_pop,"fips","msa",from_col="fips",new_col="geo",date_col=None
+            )
+        elif geo.lower() == "county":
+            geo_weights = fips_pop.rename(columns={"fips":"geo"})
+        geo_weights = geo_weights.rename(columns={"population":"weight"})


this feels like something that could eventually live in the geomapper util to get populations for everything, but for now would recommend moving this into its own function that returns the weight DF.

chinandrew · 2021-01-14T15:35:10Z

doctor_visits/delphi_doctor_visits/sensorize.py

+
+
+    @staticmethod
+    def sensorize(


There's a lot going on here -- I'd split it up into a few different methods that get called depending on the various conditions

rumackaaron added 6 commits November 17, 2020 10:54

Initial sensorization commit

72995f7

Column names from config

e95df8b

Faster code, fix global regression fit

422f61f

Option to fit without intercept

24435c6

Sensorize tests

5f0d9e4

Fix linting

df73cfc

rumackaaron changed the base branch from main to dv-package November 20, 2020 18:51

Intercept fit method based on params

96066f9

tildechris mentioned this pull request Nov 30, 2020

Add Change Healthcare signals to the map cmu-delphi/www-covidcast#660

Closed

rumackaaron marked this pull request as draft December 2, 2020 21:56

rumackaaron added 7 commits January 11, 2021 19:03

Config for static sensorization

ad8d81c

Column name bug

50afd0e

Static sensorization

0a12976

Remove debug statements

7997fa8

Revert column names to old style

babd1e8

Get geo values from different df

c2e3b6f

Test static sensorization

e3a8886

rumackaaron marked this pull request as ready for review January 12, 2021 15:31

Merge branch 'dv-package' into sensorization

d353e78

krivard requested review from mariajahja and chinandrew January 12, 2021 15:50

Update geo_maps merge

a97d344

chinandrew reviewed Jan 12, 2021

View reviewed changes

Use se variable for Jeffreys

02a9361

rumackaaron added 2 commits January 12, 2021 18:44

Update delphi utils

2b2352b

Refactor to use geomapper

f831b0a

chinandrew reviewed Jan 13, 2021

View reviewed changes

_delphi_utils_python/.pylintrc Show resolved Hide resolved

Write sensorization coefs to csv

e899740

chinandrew reviewed Jan 14, 2021

View reviewed changes

Sensorization #568

Are you sure you want to change the base?

Sensorization #568

Uh oh!

Conversation

rumackaaron commented Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rumackaaron commented Nov 20, 2020

Uh oh!

rumackaaron commented Nov 20, 2020

Uh oh!

krivard commented Nov 30, 2020

Uh oh!

rumackaaron commented Dec 1, 2020

Uh oh!

krivard commented Dec 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rumackaaron commented Jan 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mariajahja commented Jan 12, 2021

Uh oh!

krivard commented Jan 12, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rumackaaron commented Nov 20, 2020 •

edited

Loading