R/calc_refrates.R
calc_refrates.Rd
Calculate age-, sex-, cohort-, region-specific incidence rates from a cohort
calc_refrates(
df,
dattype = NULL,
count_var,
refpop_df,
calc_totals = FALSE,
fill_sites = "no",
region_var = NULL,
age_var = NULL,
sex_var = NULL,
year_var = NULL,
race_var = NULL,
site_var = NULL,
quiet = FALSE
)
dataframe in long format
can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.
variable to be counted as observed case. Should be 1 for case to be counted.
df where reference population data is defined. Only required if option futime = "refpop" is chosen. It is assumed that refpop_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "population_pyar" for person-years at risk in the respective age/sex/year cohort. refpop_df must use the same category coding of age, sex, region, year and site as age_var, sex_var, region_var, year_var and site_var.
option to calculate totals for all age-groups, all sexes, all years, all races, all sites. Default is FALSE.
option to fill missing sites in observed with incidence rate of 0. Needs to define the coding system used. Can be either "no" for not filling missing sites. "icd2d" for ICD-O-3 2 digit (C00-C80), "icd3d" for ICD-O-3 3digit, "icd10gm2d" for ICD-10-GM 2-digit (C00-C97), "sitewho" for Site SEER WHO coding (no 1-89 categories), "sitewho_b" for Site SEER WHO B recoding (no. 1-111 categories), "sitewho_epi" for SITE SEER WHO coding with additional sums, "sitewhogen" for SITE WHO coding with less categories to make compatible for international rates, "sitewho_num" for numeric coding of Site SEER WHO coding (no 1-89 categories), "sitewho_b_num" for numeric coding of Site SEER WHO B recoding (no. 1-111 categories), "sitewhogen_num" for numeric international rates, c("manual", char_vector) of sites manually defined
variable in df that contains information on region where case was incident. Default is set if dattype is given.
variable in df that contains information on age-group. Default is set if dattype is given.
variable in df that contains information on sex. Default is set if dattype is given.
variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given.
optional argument, if rates should be calculated stratified by race. If you want to use this option, provide variable name of df that contains race information. If race_var is provided refpop_df needs to contain the variable "race".
variable in df that contains information on ICD code of case diagnosis. Cases are usually the second cancers. Default is set if dattype is given.
If TRUE, warnings and messages will be suppressed. Default is FALSE.
df
#load sample data
data("us_second_cancer")
data("population_us")
us_second_cancer %>%
#create variable to indicate to be counted as case
dplyr::mutate(is_case = 1) %>%
#calculate refrates - warning: these are not realistic numbers, just showing functionality
calc_refrates(dattype = "seer", , count_var = "is_case", refpop_df = population_us,
region_var = "registry", age_var = "fc_agegroup", sex_var = "sex",
site_var = "t_site_icd")
#> [INFO Reference Population Missing] For some strata no population can be found.
#> ℹ 144 strata have no reference population in `refpop_df`
#> - Solution could be to add these strata to `refpop_df`.
#> ! Check attribute `problems_missing_refpop_strata` of results to see what strata are affected.
#>
#> # A tidytable: 6,181 × 9
#> t_site region year sex age incidence_cases incidence_crude_rate
#> <chr> <fct> <chr> <fct> <chr> <dbl> <dbl>
#> 1 C14 SEER Reg 01 - … 1990… Fema… 00 -… 1 0.145
#> 2 C14 SEER Reg 01 - … 1990… Fema… 25 -… 1 0.118
#> 3 C14 SEER Reg 01 - … 1990… Fema… 35 -… 2 0.231
#> 4 C14 SEER Reg 01 - … 1990… Fema… 45 -… 3 0.486
#> 5 C14 SEER Reg 01 - … 1990… Fema… 50 -… 1 0.208
#> 6 C14 SEER Reg 01 - … 1990… Fema… 60 -… 2 0.518
#> 7 C14 SEER Reg 01 - … 1990… Fema… 70 -… 1 0.303
#> 8 C14 SEER Reg 01 - … 1990… Fema… 75 -… 1 0.381
#> 9 C14 SEER Reg 01 - … 1990… Male 00 -… 1 0.138
#> 10 C14 SEER Reg 01 - … 1990… Male 25 -… 1 0.113
#> # ℹ 6,171 more rows
#> # ℹ 2 more variables: population_pyar <dbl>, population_n_per_year <dbl>