Calculate age-standardized incidence rates

asir(
  df,
  dattype = NULL,
  std_pop = "ESP2013",
  truncate_std_pop = FALSE,
  futime_src = "refpop",
  summarize_groups = "none",
  count_var,
  stdpop_df = standard_population,
  refpop_df = population,
  region_var = NULL,
  age_var = NULL,
  sex_var = NULL,
  year_var = NULL,
  site_var = NULL,
  futime_var = NULL,
  pyar_var = NULL,
  alpha = 0.05
)

Arguments

df

dataframe in wide format

dattype

can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.

std_pop

can be either "ESP2013, ESP1976, WHO1960, WHO2000

truncate_std_pop

if TRUE standard population will be truncated for all age-groups that do not occur in df

futime_src

can be either "refpop" or "cohort". Default is "refpop".

summarize_groups

option to define summarizing stratified groups. Default is "none". If you want to define variables that should be summarized into one group, you can chose from region_var, sex_var, year_var. Define multiple summarize variables by summarize_groups = c("region", "sex", "year")

count_var

variable to be counted as observed case. Should be 1 for case to be counted.

stdpop_df

df where standard population is defined. It is assumed that stdpop_df has the columns "sex" for biological sex, "age" for age-groups, "standard_pop" for name of standard population (e.g. "European Standard Population 2013) and "population_n" for size of standard population age-group. stdpop_df must use the same category coding of age and sex as age_var and sex_var.

refpop_df

df where reference population data is defined. Only required if option futime = "refpop" is chosen. It is assumed that refpop_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "population_pyar" for person-years at risk in the respective age/sex/year cohort. refpop_df must use the same category coding of age, sex, region, year and site as age_var, sex_var, region_var, year_var and site_var.

region_var

variable in df that contains information on region where case was incident. Default is set if dattype is given.

age_var

variable in df that contains information on age-group. Default is set if dattype is given.

sex_var

variable in df that contains information on biological sex. Default is set if dattype is given.

year_var

variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given.

site_var

variable in df that contains information on ICD code of case diagnosis. Default is set if dattype is given.

futime_var

variable in df that contains follow-up time per person (in years) in cohort (can only be used with futime_src = "cohort"). Default is set if dattype is given.

pyar_var

variable in refpop_df that contains person-years-at-risk in reference population (can only be used with futime_src = "refpop") Default is set if dattype is given.

alpha

significance level for confidence interval calculations. Default is alpha = 0.05 which will give 95 percent confidence intervals.

Value

df

Examples

#load sample data
data("us_second_cancer")
data("standard_population")
data("population_us")

#make wide data as this is the required format
usdata_wide <- us_second_cancer %>%
                    #only use sample
                    dplyr::filter(as.numeric(fake_id) < 200000) %>%
                    msSPChelpR::reshape_wide_tidyr(case_id_var = "fake_id", 
                    time_id_var = "SEQ_NUM", timevar_max = 2)
#> Long dataset had too many cases per patient. Wide dataset is limited to  2  cases per id as defined in timevar_max option.
                    
#create count variable
usdata_wide <- usdata_wide %>%
                    dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ 1,
                    TRUE ~ 0))
 
#remove cases for which no reference population exists
usdata_wide <- usdata_wide %>%
                    dplyr::filter(t_yeardiag.2 %in% c("1990 - 1994", "1995 - 1999", "2000 - 2004",
                                                       "2005 - 2009", "2010 - 2014"))
                    

#now we can run the function
msSPChelpR::asir(usdata_wide,
      dattype = "seer",
      std_pop = "ESP2013",
      truncate_std_pop = FALSE,
      futime_src = "refpop",
      summarize_groups = "none",
      count_var = "count_spc",
      refpop_df = population_us,
      region_var = "registry.1", 
      age_var = "fc_agegroup.1",
      sex_var = "sex.1",
      year_var = "t_yeardiag.2", 
      site_var = "t_site_icd.2",
      pyar_var = "population_pyar")
#> Using person-years at risk [PYAR] from reference population as pyears for calculating incidence rates.
#> Be careful, in this calculation it is assumed that all included regions have collected data for the full time period: 1990 to 2010
#>                        If you have included registries with differing times, please check this assumption by looking at groups with 0 incidence and specify option 'inclusion_restrictions' if needed.
#> The following regions, age groups, years, sexes and ICD codes are considered:  SEER Reg 01 - San Francisco-Oakland SMSA, SEER Reg 02 - Connecticut, SEER Reg 20 - Detroit (Metropolitan), SEER Reg 21 - Hawaii 1995 - 1999, 2000 - 2004, 2005 - 2009, 2010 - 2014, 1990 - 1994 00 - 04, 05 - 09, 10 - 14, 15 - 19, 20 - 24, 25 - 29, 30 - 34, 35 - 39, 40 - 44, 45 - 49, 50 - 54, 55 - 59, 60 - 64, 65 - 69, 70 - 74, 75 - 79, 80 - 84, 85 - 120 Female, Male C18, C34, C44, C50, C54, C64, C80, C14
#> # A tibble: 320 × 17
#>    age         region sex   year  t_site  asir observed   pyar abs_ir abs_ir_lci
#>    <chr>       <chr>  <chr> <chr> <chr>  <dbl>    <dbl>  <dbl>  <dbl>      <dbl>
#>  1 Age standa… SEER … Fema… 1990… C14        0        0 1.92e7      0          0
#>  2 Age standa… SEER … Fema… 1990… C18        0        0 1.92e7      0          0
#>  3 Age standa… SEER … Fema… 1990… C34        0        0 1.92e7      0          0
#>  4 Age standa… SEER … Fema… 1990… C44        0        0 1.92e7      0          0
#>  5 Age standa… SEER … Fema… 1990… C50        0        0 1.92e7      0          0
#>  6 Age standa… SEER … Fema… 1990… C54        0        0 1.92e7      0          0
#>  7 Age standa… SEER … Fema… 1990… C64        0        0 1.92e7      0          0
#>  8 Age standa… SEER … Fema… 1990… C80        0        0 1.92e7      0          0
#>  9 Age standa… SEER … Fema… 1995… C14        0        0 2.01e7      0          0
#> 10 Age standa… SEER … Fema… 1995… C18        0        0 2.01e7      0          0
#> # ℹ 310 more rows
#> # ℹ 7 more variables: abs_ir_uci <dbl>, asir_copy <dbl>, asir_lci <dbl>,
#> #   asir_lci_gam <dbl>, asir_uci <dbl>, asir_uci_gam <dbl>, asir_e6 <dbl>