Calculate age-standardized incidence rates
asir(
df,
dattype = NULL,
std_pop = "ESP2013",
truncate_std_pop = FALSE,
futime_src = "refpop",
summarize_groups = "none",
count_var,
stdpop_df = standard_population,
refpop_df = population,
region_var = NULL,
age_var = NULL,
sex_var = NULL,
year_var = NULL,
site_var = NULL,
futime_var = NULL,
pyar_var = NULL,
alpha = 0.05
)
dataframe in wide format
can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.
can be either "ESP2013, ESP1976, WHO1960, WHO2000
if TRUE standard population will be truncated for all age-groups that do not occur in df
can be either "refpop" or "cohort". Default is "refpop".
option to define summarizing stratified groups. Default is "none". If you want to define variables that should be summarized into one group, you can chose from region_var, sex_var, year_var. Define multiple summarize variables by summarize_groups = c("region", "sex", "year")
variable to be counted as observed case. Should be 1 for case to be counted.
df where standard population is defined. It is assumed that stdpop_df has the columns "sex" for biological sex, "age" for age-groups, "standard_pop" for name of standard population (e.g. "European Standard Population 2013) and "population_n" for size of standard population age-group. stdpop_df must use the same category coding of age and sex as age_var and sex_var.
df where reference population data is defined. Only required if option futime = "refpop" is chosen. It is assumed that refpop_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "population_pyar" for person-years at risk in the respective age/sex/year cohort. refpop_df must use the same category coding of age, sex, region, year and site as age_var, sex_var, region_var, year_var and site_var.
variable in df that contains information on region where case was incident. Default is set if dattype is given.
variable in df that contains information on age-group. Default is set if dattype is given.
variable in df that contains information on biological sex. Default is set if dattype is given.
variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given.
variable in df that contains information on ICD code of case diagnosis. Default is set if dattype is given.
variable in df that contains follow-up time per person (in years) in cohort (can only be used with futime_src = "cohort"). Default is set if dattype is given.
variable in refpop_df that contains person-years-at-risk in reference population (can only be used with futime_src = "refpop") Default is set if dattype is given.
significance level for confidence interval calculations. Default is alpha = 0.05 which will give 95 percent confidence intervals.
df
#load sample data
data("us_second_cancer")
data("standard_population")
data("population_us")
#make wide data as this is the required format
usdata_wide <- us_second_cancer %>%
#only use sample
dplyr::filter(as.numeric(fake_id) < 200000) %>%
msSPChelpR::reshape_wide_tidyr(case_id_var = "fake_id",
time_id_var = "SEQ_NUM", timevar_max = 2)
#> Long dataset had too many cases per patient. Wide dataset is limited to 2 cases per id as defined in timevar_max option.
#create count variable
usdata_wide <- usdata_wide %>%
dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2) ~ 1,
TRUE ~ 0))
#remove cases for which no reference population exists
usdata_wide <- usdata_wide %>%
dplyr::filter(t_yeardiag.2 %in% c("1990 - 1994", "1995 - 1999", "2000 - 2004",
"2005 - 2009", "2010 - 2014"))
#now we can run the function
msSPChelpR::asir(usdata_wide,
dattype = "seer",
std_pop = "ESP2013",
truncate_std_pop = FALSE,
futime_src = "refpop",
summarize_groups = "none",
count_var = "count_spc",
refpop_df = population_us,
region_var = "registry.1",
age_var = "fc_agegroup.1",
sex_var = "sex.1",
year_var = "t_yeardiag.2",
site_var = "t_site_icd.2",
pyar_var = "population_pyar")
#> Using person-years at risk [PYAR] from reference population as pyears for calculating incidence rates.
#> Be careful, in this calculation it is assumed that all included regions have collected data for the full time period: 1990 to 2010
#> If you have included registries with differing times, please check this assumption by looking at groups with 0 incidence and specify option 'inclusion_restrictions' if needed.
#> The following regions, age groups, years, sexes and ICD codes are considered: SEER Reg 01 - San Francisco-Oakland SMSA, SEER Reg 02 - Connecticut, SEER Reg 20 - Detroit (Metropolitan), SEER Reg 21 - Hawaii 1995 - 1999, 2000 - 2004, 2005 - 2009, 2010 - 2014, 1990 - 1994 00 - 04, 05 - 09, 10 - 14, 15 - 19, 20 - 24, 25 - 29, 30 - 34, 35 - 39, 40 - 44, 45 - 49, 50 - 54, 55 - 59, 60 - 64, 65 - 69, 70 - 74, 75 - 79, 80 - 84, 85 - 120 Female, Male C18, C34, C44, C50, C54, C64, C80, C14
#> # A tibble: 320 × 17
#> age region sex year t_site asir observed pyar abs_ir abs_ir_lci
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Age standa… SEER … Fema… 1990… C14 0 0 1.92e7 0 0
#> 2 Age standa… SEER … Fema… 1990… C18 0 0 1.92e7 0 0
#> 3 Age standa… SEER … Fema… 1990… C34 0 0 1.92e7 0 0
#> 4 Age standa… SEER … Fema… 1990… C44 0 0 1.92e7 0 0
#> 5 Age standa… SEER … Fema… 1990… C50 0 0 1.92e7 0 0
#> 6 Age standa… SEER … Fema… 1990… C54 0 0 1.92e7 0 0
#> 7 Age standa… SEER … Fema… 1990… C64 0 0 1.92e7 0 0
#> 8 Age standa… SEER … Fema… 1990… C80 0 0 1.92e7 0 0
#> 9 Age standa… SEER … Fema… 1995… C14 0 0 2.01e7 0 0
#> 10 Age standa… SEER … Fema… 1995… C18 0 0 2.01e7 0 0
#> # ℹ 310 more rows
#> # ℹ 7 more variables: abs_ir_uci <dbl>, asir_copy <dbl>, asir_lci <dbl>,
#> # asir_lci_gam <dbl>, asir_uci <dbl>, asir_uci_gam <dbl>, asir_e6 <dbl>