Skip to content

Commit 2834e3a

Browse files
authored
Merge pull request #1 from cmu-delphi/main
Sync fork
2 parents 12d26a8 + 1ce310a commit 2834e3a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+2126
-494
lines changed

deploy.json

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -210,8 +210,22 @@
210210
"// acquisition - covid_hosp",
211211
{
212212
"type": "move",
213-
"src": "src/acquisition/covid_hosp/",
214-
"dst": "[[package]]/acquisition/covid_hosp/",
213+
"src": "src/acquisition/covid_hosp/common/",
214+
"dst": "[[package]]/acquisition/covid_hosp/common/",
215+
"match": "^.*\\.(py)$",
216+
"add-header-comment": true
217+
},
218+
{
219+
"type": "move",
220+
"src": "src/acquisition/covid_hosp/facility/",
221+
"dst": "[[package]]/acquisition/covid_hosp/facility/",
222+
"match": "^.*\\.(py)$",
223+
"add-header-comment": true
224+
},
225+
{
226+
"type": "move",
227+
"src": "src/acquisition/covid_hosp/state_timeseries/",
228+
"dst": "[[package]]/acquisition/covid_hosp/state_timeseries/",
215229
"match": "^.*\\.(py)$",
216230
"add-header-comment": true
217231
},

docs/api/covidcast-signals/chng.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ commercial purposes.
2626
| --- | --- |
2727
| `smoothed_outpatient_covid` | Estimated percentage of outpatient doctor visits with confirmed COVID-19, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother |
2828
| `smoothed_adj_outpatient_covid` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) |
29+
| `smoothed_outpatient_cli` | Estimated percentage of outpatient doctor visits primarily about COVID-related symptoms, based on Change Healthcare claims data that has been de-identified in accordance with HIPAA privacy regulations, smoothed in time using a Gaussian linear smoother |
30+
| `smoothed_adj_outpatient_cli` | Same, but with systematic day-of-week effects removed; see [details below](#day-of-week-adjustment) |
2931

3032
## Table of contents
3133
{: .no_toc .text-delta}
@@ -64,15 +66,38 @@ not necessarily indicative of a true increase of COVID-19 in a location.
6466

6567
## Qualifying Conditions
6668

67-
We receive data on the following two categories of counts:
69+
We receive data on the following six categories of counts:
6870

6971
- Denominator: Daily count of all unique outpatient visits.
7072
- Covid: Daily count of all unique visits with primary ICD-10 code in any of:
7173
{U07.1, B97.21, or B97.29}.
74+
- COVID-like: Daily count of all unique outpatient visits with primary ICD-10 code
75+
of any of: {U07.1, U07.2, B97.29, J12.81, Z03.818, B34.2, J12.89}.
76+
- Flu-like: Daily count of all unique outpatient visits with primary ICD-10 code
77+
of any of: {J22, B34.9}. The occurrence of these codes in an area is
78+
correlated with that area's historical influenza activity, but are
79+
diagnostic codes not specific to influenza and can appear in COVID-19 cases.
80+
- Mixed: Daily count of all unique outpatient visits with primary ICD-10 code of
81+
any of: {Z20.828, J12.9}. The occurance of these codes in an area is
82+
correlated to a blend of that area's COVID-19 confirmed case counts and
83+
influenza behavior, and are not diagnostic codes specific to either disease.
84+
- Flu: Daily count of all unique outpatient visits with primary ICD-10 code of
85+
any of: {J09\*, J10\*, J11\*}. The asterisk `*` indicates inclusion of all
86+
subcodes. This set of codes are assigned to influenza viruses.
87+
88+
For the COVID signal, we consider only the *Denominator* and *Covid* counts.
89+
90+
For the CLI signal, if a patient has multiple visits on the same date (and hence
91+
multiple primary ICD-10 codes), then we will only count one of and in descending
92+
order: *Flu*, *COVID-like*, *Flu-like*, *Mixed*. This ordering tries to account for
93+
the most definitive confirmation, e.g. the codes assigned to *Flu* are only used
94+
for confirmed influenza cases, which are unrelated to the COVID-19 coronavirus.
7295

7396
## Estimation
7497

75-
### COVID-Like Illness
98+
### COVID Illness
99+
100+
The following estimation method is used for the `*_outpatient_covid` signals.
76101

77102
For a fixed location $$i$$ and time $$t$$, let $$Y_{it}$$
78103
denote the Covid counts and let $$N_{it}$$ be the
@@ -83,6 +108,22 @@ $$
83108
\hat p_{it} = 100 \cdot \frac{Y_{it}}{N_{it}}
84109
$$
85110

111+
### COVID-Like Illness
112+
113+
The following estimation method is used for the `*_outpatient_cli` signals.
114+
115+
For a fixed location $$i$$ and time $$t$$, let $$Y_{it}^{\text{Covid-like}}$$,
116+
$$Y_{it}^{\text{Flu-like}}$$, $$Y_{it}^{\text{Mixed}}$$, $$Y_{it}^{\text{Flu}}$$
117+
denote the correspondingly named ICD-filtered counts and let $$N_{it}$$ be the
118+
total count of visits (the *Denominator*). Our estimate of the CLI percentage is
119+
given by
120+
121+
$$
122+
\hat p_{it} = 100 \cdot \frac{Y_{it}^{\text{Covid-like}} +
123+
\left((Y_{it}^{\text{Flu-like}} + Y_{it}^{\text{Mixed}}) -
124+
Y_{it}^{\text{Flu}}\right)}{N_{it}}
125+
$$
126+
86127
### Day-of-Week Adjustment
87128

88129
The fraction of visits due to COVID-19 is dependent on the day of the week. On
@@ -131,6 +172,10 @@ $$\dot{Y}_{it} = Y_{it} / \alpha_{wd(t)}.$$
131172
We then use these adjusted counts to estimate the COVID-19 percentage as described
132173
above.
133174

175+
For the CLI indicator, we apply the same method to the numerator $$Y_{it} =
176+
Y_{it}^{\text{Covid-like}} + \left((Y_{it}^{\text{Flu-like}} +
177+
Y_{it}^{\text{Mixed}}) - Y_{it}^{\text{Flu}}\right).$$
178+
134179
### Backwards Padding
135180

136181
To help with the reporting delay, we perform the following simple

docs/api/covidcast-signals/fb-survey.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ described in the sections below:
3030
traveling, and activities outside the home
3131
3. [Testing indicators](#testing-indicators) based on respondent reporting of
3232
their COVID test results
33+
4. [Mental health indicators](#mental-health-indicators), based on self-reports
34+
of anxiety, depression, isolation, and worry about COVID
3335

3436
## Table of contents
3537
{: .no_toc .text-delta}
@@ -277,12 +279,17 @@ data in the estimation procedures described above.
277279

278280
## Behavior Indicators
279281

280-
| Signal | Description | Survey Item |
281-
| --- | --- | --- |
282-
| `smoothed_wearing_mask` | Estimated percentage of people who wore a mask most or all of the time while in public in the past 5 days; those not in public in the past 5 days are not counted. | C14 |
283-
284-
These indicators are based on questions in Wave 4 of the survey, introduced on
285-
September 8, 2020.
282+
| Signal | Description | Survey Item | Introduced |
283+
| --- | --- | --- | --- |
284+
| `smoothed_wearing_mask` | Estimated percentage of people who wore a mask for most or all of the time while in public in the past 5 days; those not in public in the past 5 days are not counted. | C14 | Wave 4, Sept 8, 2020 |
285+
| `smoothed_others_masked` | Estimated percentage of respondents who say that most or all *other* people wear masks, when they are in public and social distancing is not possible | C16 | Wave 5, Nov 24, 2020 |
286+
| `smoothed_travel_outside_state_5d` | Estimated percentage of respondents who report traveling outside their state in the past 5 days | C6 | Wave 1 |
287+
| `smoothed_work_outside_home_1d` | Estimated percentage of respondents who worked or went to school outside their home in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
288+
| `smoothed_shop_1d` | Estimated percentage of respondents who went to a "market, grocery store, or pharmacy" in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
289+
| `smoothed_restaurant_1d` | Estimated percentage of respondents who went to a "bar, restaurant, or cafe" in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
290+
| `smoothed_spent_time_1d` | Estimated percentage of respondents who "spent time with someone who isn't currently staying with you" in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
291+
| `smoothed_large_event_1d` | Estimated percentage of respondents who "attended an event with more than 10 people" in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
292+
| `smoothed_public_transit_1d` | Estimated percentage of respondents who "used public transit" in the past 24 hours | C13 | Wave 4, Sept 8, 2020 |
286293

287294
Weighted versions of these signals, using the [survey weighting described
288295
below](#survey-weighting) to be more representative of state demographics, are
@@ -307,6 +314,27 @@ also available. These have names beginning `smoothed_w`, such as
307314
`smoothed_wtested_14d`.
308315

309316

317+
## Mental Health Indicators
318+
319+
| Signal | Description | Survey Item |
320+
| --- | --- | --- |
321+
| `smoothed_anxious_5d` | Estimated percentage of respondents who reported feeling "nervous, anxious, or on edge" for most or all of the past 5 days | C8 |
322+
| `smoothed_depressed_5d` | Estimated percentage of respondents who reported feeling depressed for most or all of the past 5 days | C8 |
323+
| `smoothed_felt_isolated_5d` | Estimated percentage of respondents who reported feeling "isolated from others" for most or all of the past 5 days | C8 |
324+
| `smoothed_worried_become_ill` | Estimated percentage of respondents who reported feeling very or somewhat worried that "you or someone in your immediate family might become seriously ill from COVID-19" | C9 |
325+
| `smoothed_worried_finances` | Estimated percentage of respondents who report being very or somewhat worried about their "household's finances for the next month" | C15 |
326+
327+
Some of these questions were present in the earliest waves of the survey, but
328+
only in Wave 4 did respondents consent to our use of aggregate data to
329+
study other impacts of COVID, such as mental health. Hence, these aggregates only
330+
include respondents to Wave 4 and later waves, beginning September 8, 2020.
331+
332+
Weighted versions of these signals, using the [survey weighting described
333+
below](#survey-weighting) to be more representative of state demographics, are
334+
also available. These have names beginning `smoothed_w`, such as
335+
`smoothed_wdepressed_14d`.
336+
337+
310338
## Survey Weighting
311339

312340
Notice that the estimates defined in the previous sections are calculated with

docs/api/covidcast-signals/hospital-admissions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ COVID-associated diagnosis code in a given location, on a given day.
2121

2222
| Signal | Description |
2323
| --- | --- |
24-
| `smoothed_covid19` | Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on electronic medical record and claims data from health system partners, smoothed in time using a Gaussian linear smoother. _This signal is no longer updated as of 1 October, 2020._ |
25-
| `smoothed_adj_covid19` | Same as `smoothed_covid19`, but with systematic day-of-week effects removed using [the same mechanism as in `doctor-visits`](doctor-visits.md#day-of-week-adjustment). _This signal is no longer updated as of 1 October, 2020._ |
2624
| `smoothed_covid19_from_claims` | Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on claims data from health system partners, smoothed in time using a Gaussian linear smoother |
2725
| `smoothed_adj_covid19_from_claims` | Same as `smoothed_covid19_from_claims`, but with systematic day-of-week effects removed using [the same mechanism as in `doctor-visits`](doctor-visits.md#day-of-week-adjustment) |
26+
| `smoothed_covid19` | Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on electronic medical record and claims data from health system partners, smoothed in time using a Gaussian linear smoother. _This signal is no longer updated as of 1 October, 2020._ |
27+
| `smoothed_adj_covid19` | Same as `smoothed_covid19`, but with systematic day-of-week effects removed using [the same mechanism as in `doctor-visits`](doctor-visits.md#day-of-week-adjustment). _This signal is no longer updated as of 1 October, 2020._ |
2828

2929
## Table of contents
3030
{: .no_toc .text-delta}
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: NCHS Mortality Data
3+
parent: Data Sources and Signals
4+
grand_parent: COVIDcast Epidata API
5+
---
6+
7+
# NCHS Mortality Data
8+
{: .no_toc}
9+
10+
* **Source name:** `nchs-mortality`
11+
* **First issued:** Epiweek 50 2020 (6-12 December 2020)
12+
* **Number of data revisions since 19 May 2020:** 0
13+
* **Date of last change:** Never
14+
* **Available for:** state (see [geography coding docs](../covidcast_geography.md))
15+
* **License:** [NCHS Data Use Agreement](https://www.cdc.gov/nchs/data_access/restrictions.htm)
16+
17+
This data source of national provisional death counts is based on death
18+
certificate data received and coded by the National Center for Health Statistics
19+
[(NCHS)](https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm).
20+
21+
| Signal | Description |
22+
| --- | --- |
23+
| `deaths_covid_incidence_num` | Number of weekly new deaths with confirmed or presumed COVID-19 |
24+
| `deaths_covid_incidence_prop` | Number of weekly new deaths with confirmed or presumed COVID-19, per 100,000 population |
25+
| `deaths_allcause_incidence_num` | Number of weekly new deaths from all causes |
26+
| `deaths_allcause_incidence_prop` | Number of weekly new deaths from all causes, per 100,000 population |
27+
| `deaths_flu_incidence_num` | Number of weekly new deaths involving Influenza and at least one of (Pneumonia, COVID-19)|
28+
| `deaths_flu_incidence_prop` | Number of weekly new deaths involving Influenza and at least one of (Pneumonia, COVID-19), per 100,000 population |
29+
| `deaths_pneumonia_notflu_incidence_num` | Number of weekly new deaths involving Pneumonia, excluding Influenza deaths |
30+
| `deaths_pneumonia_notflu_incidence_prop` | Number of weekly new deaths involving Pneumonia, excluding Influenza deaths, per 100,000 population |
31+
| `deaths_covid_and_pneumonia_notflu_incidence_num`| Number of weekly new deaths involving COVID-19 and Pneumonia, excluding Influenza |
32+
| `deaths_covid_and_pneumonia_notflu_incidence_prop`| Number of weekly new deaths involving COVID-19 and Pneumonia, excluding Influenza, per 100,000 population |
33+
|`deaths_pneumonia_or_flu_or_covid_incidence_num`| Number of weekly new deaths involving Pneumonia, Influenza, or COVID-19|
34+
|`deaths_pneumonia_or_flu_or_covid_incidence_prop`| Number of weekly new deaths involving Pneumonia, Influenza, or COVID-19, per 100,000 population|
35+
|`deaths_percent_of_expected`| Number of weekly new deaths for all causes in 2020 compared to the average number across the same week in 2017–2019|
36+
37+
These signals are taken directly from [Table
38+
1](https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm) without
39+
changes. National provisional death counts include deaths occurring within the
40+
50 states and the District of Columbia that have been received and coded as of
41+
the date specified during a given time period. The deaths are classified based
42+
on a new ICD-10 code. (Note that the classification is based on all the codes on
43+
the death certificate, not just the primary cause of death). The codes that are
44+
considered for each signals are described in detail
45+
[here](https://github.com/cmu-delphi/covidcast-indicators/blob/main/nchs_mortality/DETAILS.md#metrics-level-1-m1). We
46+
export the state-level data as-is in a weekly format.
47+
48+
## Table of contents
49+
{: .no_toc .text-delta}
50+
51+
1. TOC
52+
{:toc}
53+
54+
## Geographical Exceptions
55+
56+
New York City is listed as its own region in the NCHS Mortality data, but
57+
we don't consider NYC separately. The death counts for NYC are included in New
58+
York State in our reports.
59+
60+
## Report Using Epiweeks
61+
62+
We report the NCHS Mortality data in a weekly format (`time_type=week` \&
63+
`time_value=\{YYYYWW\}`, where `YYYYWW` refers to an epiweek). The CDC defines
64+
the [epiweek](https://wwwn.cdc.gov/nndss/document/MMWR_Week_overview.pdf) as
65+
seven days, from Sunday to Saturday. We check the week-ending dates provided in
66+
the NCHS morality data and use Python package
67+
[epiweeks](https://pypi.org/project/epiweeks/) to convert them into epiweek
68+
format.
69+
70+
## Missingness
71+
72+
NCHS suppresses some data to protect individual privacy and avoid publishing
73+
low-confidence figures. This includes data for jurisdictions where counts are
74+
between 1 and 9, and data for weeks where the counts are less than 50% of the
75+
expected number, since these provisional counts are highly incomplete and
76+
potentially misleading.
77+
78+
## Lag and Backfill
79+
80+
There is a lag in time between when the death occurred and when the death
81+
certificate is completed, submitted to NCHS, and processed for reporting
82+
purposes. The death counts for earlier weeks are continually revised and may
83+
increase or decrease as new and updated death certificate data are received from
84+
the states by NCHS. This delay can range from 1 to 8 weeks or even more.
85+
Some states report deaths on a daily basis, while other states report deaths weekly
86+
or monthly. State vital record reporting may also be affected or delayed by
87+
COVID-19 related response activities which make death counts not comparable
88+
across states. We check for updates reported by NCHS every weekday but will
89+
report the signals weekly (on Monday).
90+
91+
## Source and Licensing
92+
93+
This data was originally published by the National Center for Health Statistics,
94+
and is made available here as a convenience to the forecasting community under
95+
the terms of the original license. The NCHS places restrictions on how this
96+
dataset may be used: you may not attempt to identify any individual included in
97+
the data, whether by itself or through linking to other
98+
individually=identifiable data; you may only use the dataset for statistical
99+
reporting and analysis. The full text of the [NCHS Data Use
100+
Agreement](https://www.cdc.gov/nchs/data_access/restrictions.htm) is available
101+
from their website.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ group](https://delphi.cmu.edu/). The Epidata API includes:
1616
other epidemics tracked by Delphi through a variety of data streams.
1717

1818
The Delphi group is extremely grateful for Pedrito Maynard-Zhang for all his
19-
help with the Epidata API [documentation](api/index.md).
19+
help with the Epidata API [documentation](api/README.md).
2020

2121
Developers interested in modifying or extending this project are directed to
2222
the [Epidata API Development Guide](epidata_development.md).

docs/symptom-survey/collaboration-revision.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,22 @@ pandemic. We conduct revisions in collaboration with data users, fellow
1212
researchers, and public health officials, to ensure the survey data best serves
1313
public health and research goals.
1414

15-
## Survey Revisions
15+
### Current Data Use Survey
16+
17+
To allow us to better collaborate with our academic and non-profit research
18+
partners, we ask that you complete [this short
19+
checklist](http://cmu.ca1.qualtrics.com/jfe/form/SV_dnSQYuQZDkQhJ3f) of the
20+
questions currently on the CMU Delphi US symptom survey. This inquiry only needs
21+
to be completed by one member of your project team. You can complete it again at
22+
any time there are variables from this survey you'd like to add to your project.
23+
24+
This will allow us to update you of any upcoming plans for revisions or change
25+
to the current variables, prioritize the questions that are currently in use,
26+
alleviate some response burden by eliminating unused questions, and allow us to
27+
connect with current data users regarding their research interests and areas of
28+
expertise.
29+
30+
### Proposing Revisions
1631

1732
If there is a revision or question you would like us to consider, please fill
1833
out [this form requesting details about your

0 commit comments

Comments
 (0)