Skip to content

Commit b3baeb9

Browse files
committed
update README
1 parent 230e0f4 commit b3baeb9

File tree

6 files changed

+944
-11
lines changed

6 files changed

+944
-11
lines changed

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,34 @@
22

33
⚠️ This is a work in progress.
44

5-
_`pyrenew-flu-light` is an instantiation of an [Epidemia](https://imperialcollegelondon.github.io/epidemia/) influenza forecasting model in [PyRenew](https://github.com/CDCgov/PyRenew)._
5+
_`pyrenew-flu-light` is an influenza forecasting model (originally written in `R`, using the package [Epidemia](https://imperialcollegelondon.github.io/epidemia/), with the name `cfaepim`) that has been re-instantiated via [PyRenew](https://github.com/CDCgov/PyRenew)._
66

7-
NOTE: Presently, this `pyrenew-flu-light` cannot be installed and used with current NHSN, as its author is validating it on historical influenza data.
7+
NOTE: Presently, this `pyrenew-flu-light` cannot be installed and used with current NHSN data.
88

9-
Run command typically used:
9+
## Usage
10+
11+
PFL (`pyrenew-flu-light`) can be used in two modes: "active" and "historical". The "active" mode uses respiratory incidence data sourced from available API frameworks whereas the "historical" mode relies on saved dataset snapshots (currently, these snapshots are from the 2023-24 NHSN data, and were generated by `cfaepim` code).
12+
13+
Before running PFL, installing `poetry` and creating a virtual environment will very likely be necessary. This can be first done via `pipx install poetry` (see [Poetry](https://python-poetry.org/docs/) website for more information) and then, to create the environment, `poetry install` followed by `poetry shell` in the PFL folder (only done after cloning: `git clone https://github.com/CDCgov/pyrenew-flu-light`).
14+
15+
After `poetry shell` has been run, then to use the "historical" mode (once within the folder `pyrenew_flu_light`), the `run.py` file creates an _experiment_. Each experiment generates forecasts based on selected jurisdictions and a configuration file. The configuration files for the "historical" mode are derived from the configuration files used in `cfaepim`; the reader will most often never need to worry about configuration files in the "historical" mode. To run:
1016

1117
```
12-
poetry run python tut_epim_port_msr.py --reporting_date 2024-01-20 --regions NY --historical --forecast
18+
poetry run python run.py --reporting_date 2024-01-20 --regions NY,AL,CA --historical --forecast --exp_name experiment_01
1319
14-
python3 tut_epim_port_msr.py --reporting_date 2024-01-20 --regions NY --historical --forecast
20+
poetry run python run.py --reporting_date 2024-03-30 --regions not:NY,AL,CA --historical --forecast --exp_name experiment_02
1521
```
1622

1723
## ...Contained Within This Repository
1824

1925

20-
- [x] An Apache 2.0 license
21-
- [x] An issue template (taken from PyRenew)
2226
- [ ] A pull request template
23-
- [x] A `pre-commit` schema
24-
- [x] A `.gitignore` file (taken from PyRenew)
25-
- [ ] A website
26-
- [ ] Formal contribution language (hybrid, from `cdcent`)
27+
- [ ] A website.
28+
- [x] An Apache 2.0 license
29+
- [x] An issue template (taken from PyRenew).
30+
- [x] A `pre-commit` schema.
31+
- [x] A `.gitignore` file (taken from PyRenew).
32+
- [x] Formal contribution language (hybrid, from `cdcent`)
2733

2834

2935
## CDCGov & Disclaimers, Notices, And Code Of Conduct

assets/ref/fit.R

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
#!/usr/bin/env Rscript
2+
3+
#' fit forecast models for a given report date
4+
#'
5+
#' @param report_date report date for which
6+
#' to run the analysis
7+
#' @param output_parent_directory report will
8+
#' be saved in a subdirectory named after the report date,
9+
#' but within this parent directory. Defaults to creating
10+
#' and/or using a directory named `"output"` within the
11+
#' current working directory for this purpose.
12+
#' @param data_cutoff_date Unless use data through
13+
#' the given date. If `NULL`, use all
14+
#' available data. Default `NULL`.
15+
#' @param locations Only fit these locations.
16+
#' If `NULL`, use all available locations.
17+
#' Default `NULL`.
18+
#' @param param_path Where to look for a parameter
19+
#' file. Default to a file named `"params.toml"`
20+
#' within a directory named `"data"` within the
21+
#' current working directory.
22+
#' @param location_data_path Where to look for a FluSight
23+
#' `locations.csv` containing locations to fit and their
24+
#' populations. Default to a file named `"locations.csv"`
25+
#' within a directory named `"data"` within the
26+
#' current working directory.
27+
#' @param healthdata_api_key_id API key ID for authenticating
28+
#' to HealthData.gov SODA API. Not required, but polite.
29+
#' Default `NULL`
30+
#' @param healthdata_api_key_secret Corresponding
31+
#' API key secrete for authenticating
32+
#' to HealthData.gov SODA API. Not required, but polite.
33+
#' Default `NULL`.
34+
#' @param overwrite_params Overwrite an existing
35+
#' archived parameter file if it exists?
36+
#' Boolean, default `FALSE`. If `FALSE`
37+
#' and an archived parameter file already
38+
#' exists, the pipeline will error out.
39+
#' @return `TRUE` on success.
40+
fit <- function(report_date,
41+
output_parent_directory = "output",
42+
data_cutoff_date = NULL,
43+
locations = NULL,
44+
param_path = fs::path("data", "params.toml"),
45+
location_data_path = fs::path("data", "locations.csv"),
46+
healthdata_api_key_id = NULL,
47+
healthdata_api_key_secret = NULL,
48+
overwrite_params = FALSE) {
49+
cli::cli_inform("Using working directory {fs::path_wd()}")
50+
51+
report_outdir <- fs::path(
52+
output_parent_directory,
53+
report_date
54+
)
55+
56+
fs::dir_create(report_outdir)
57+
58+
data_save_path <- fs::path(
59+
report_outdir,
60+
paste0(report_date, "_clean_data", ".tsv")
61+
)
62+
63+
param_save_path <- fs::path(
64+
report_outdir,
65+
paste0(report_date, "_config", ".toml")
66+
)
67+
68+
cli::cli_inform("reading in run parameters from {param_path}")
69+
params <- RcppTOML::parseTOML(param_path)
70+
71+
cli::cli_inform("Archiving parameters at {param_save_path}")
72+
fs::file_copy(param_path,
73+
param_save_path,
74+
overwrite = overwrite_params
75+
)
76+
77+
cli::cli_inform("Pulling and cleaning data")
78+
clean_data <- cfaepim::get_data(
79+
params$first_fitting_date,
80+
location_data_path,
81+
api_key_id = healthdata_api_key_id,
82+
api_key_secret = healthdata_api_key_secret,
83+
recency_effect_length = params$recency_effect_length
84+
)
85+
86+
for (loc in unique(clean_data$location)) {
87+
loc_start_date <- params$location_specific_start_dates[[loc]]
88+
loc_cutoff_date <- params$location_specific_cutoff_dates[[loc]]
89+
90+
if (!is.null(loc_start_date)) {
91+
cli::cli_inform(paste0(
92+
"Using custom start date {loc_start_date} ",
93+
"for location {loc}"
94+
))
95+
clean_data <- clean_data |>
96+
dplyr::filter(location != !!loc | date >= !!loc_start_date)
97+
}
98+
99+
if (!is.null(loc_cutoff_date)) {
100+
cli::cli_inform(paste0(
101+
"Using custom cutoff date {loc_cutoff_date} ",
102+
"for location {loc}"
103+
))
104+
clean_data <- clean_data |>
105+
dplyr::filter(location != !!loc | date <= !!loc_cutoff_date)
106+
}
107+
}
108+
109+
if (!is.null(data_cutoff_date)) {
110+
clean_data <- clean_data |>
111+
dplyr::filter(date <= data_cutoff_date)
112+
}
113+
114+
unobserved_dates <- params$location_specific_excluded_dates |>
115+
stack() |>
116+
tibble::as_tibble() |>
117+
dplyr::mutate(
118+
date = as.Date(values),
119+
location = ind,
120+
nonobservation_period = TRUE
121+
) |>
122+
dplyr::select(
123+
date,
124+
location,
125+
nonobservation_period
126+
)
127+
128+
clean_data <- clean_data |>
129+
dplyr::left_join(
130+
unobserved_dates,
131+
by = c("location", "date")
132+
) |>
133+
dplyr::mutate(
134+
nonobservation_period =
135+
tidyr::replace_na(
136+
nonobservation_period,
137+
FALSE
138+
)
139+
)
140+
141+
142+
cli::cli_inform("Archiving cleaned data at {data_save_path}")
143+
readr::write_tsv(clean_data, data_save_path)
144+
145+
if (!is.null(locations)) {
146+
loc_vec <- as.character(locations)
147+
} else {
148+
loc_vec <- clean_data |>
149+
dplyr::distinct(location) |>
150+
dplyr::pull()
151+
}
152+
names(loc_vec) <- loc_vec
153+
154+
cli::cli_alert("Fitting the following locations: {loc_vec}")
155+
156+
cli::cli_alert("Setting up models")
157+
fitting_args <- lapply(loc_vec,
158+
cfaepim::build_state_light_model,
159+
clean_data = clean_data,
160+
params = params,
161+
adapt_delta = params$mcmc$adapt_delta,
162+
max_treedepth = params$mcmc$max_treedepth,
163+
n_chains = params$mcmc$n_chains,
164+
n_warmup = params$mcmc$n_warmup,
165+
n_iter = params$mcmc$n_iter
166+
)
167+
168+
cli::cli_alert("{length(fitting_args)} models to fit")
169+
cli::cli_alert("Starting model fit at {Sys.time()}")
170+
171+
raw_results <- cfaepim::fit_future(
172+
fitting_args,
173+
save_results = TRUE,
174+
overwrite_existing = FALSE,
175+
save_dir = report_outdir,
176+
save_filename_pattern = paste0("_", report_date, "_epim_results")
177+
)
178+
179+
print(raw_results[[1]])
180+
181+
cli::cli_alert("Model fit finished at {Sys.time()}")
182+
183+
return(TRUE)
184+
}
185+
186+
argv_parser <- argparser::arg_parser(
187+
paste0(
188+
"Run Epidemia forecast analysis ",
189+
"for a given report date"
190+
)
191+
) |>
192+
argparser::add_argument(
193+
"report_date",
194+
help = "Date for which to generate a forecast report"
195+
) |>
196+
argparser::add_argument(
197+
"--data-cutoff",
198+
help = "Only use data up to this date for forecasting"
199+
) |>
200+
argparser::add_argument(
201+
"--locations",
202+
help = "Only fit to these locations"
203+
) |>
204+
argparser::add_argument(
205+
"--outdir",
206+
help = paste0(
207+
"Write forecast output to a timestamped ",
208+
"subdirectory of this directory"
209+
),
210+
default = "output"
211+
) |>
212+
argparser::add_argument(
213+
"--params",
214+
help = "Path to parameter file",
215+
default = "data/params.toml"
216+
) |>
217+
argparser::add_argument(
218+
"--overwrite-params",
219+
help = "Overwrite an existing archived parameter file?",
220+
default = FALSE
221+
)
222+
223+
argv <- argparser::parse_args(argv_parser)
224+
225+
n_cores_use <- parallel::detectCores() - 1
226+
future::plan(future::multicore(workers = n_cores_use))
227+
228+
if (is.na(argv$data_cutoff)) {
229+
argv$data_cutoff <- NULL
230+
}
231+
if (is.na(argv$locations)) {
232+
argv$locations <- NULL
233+
} else {
234+
argv$locations <- unlist(strsplit(
235+
argv$locations,
236+
" "
237+
))
238+
}
239+
240+
## hack to make argparser slightly more system-agnostic
241+
if (argv$params == "data/params.toml") {
242+
argv$params <- fs::path("data", "params.toml")
243+
}
244+
245+
api_creds <- cfaepim::get_api_credentials()
246+
247+
fit(
248+
argv$report_date,
249+
argv$outdir,
250+
data_cutoff_date = argv$data_cutoff,
251+
locations = argv$locations,
252+
param_path = argv$params,
253+
healthdata_api_key_id = api_creds$id,
254+
healthdata_api_key_secret = api_creds$key,
255+
overwrite_params = argv$overwrite_params
256+
)

0 commit comments

Comments
 (0)