Calculate standardized incidence ratios with custom grouping variables stratified by follow-up time

sir_byfutime(
  df,
  dattype = NULL,
  ybreak_vars = "none",
  xbreak_var = "none",
  futime_breaks = c(0, 0.5, 1, 5, 10, Inf),
  count_var,
  refrates_df = rates,
  calc_total_row = TRUE,
  calc_total_fu = TRUE,
  region_var = NULL,
  age_var = NULL,
  sex_var = NULL,
  year_var = NULL,
  race_var = NULL,
  site_var = NULL,
  futime_var = NULL,
  expect_missing_refstrata_df = NULL,
  alpha = 0.05,
  quiet = FALSE
)

Arguments

df

dataframe in wide format

dattype

can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.

ybreak_vars

variables from df by which SIRs should be stratified in result df. Multiple variables will result in appended rows in result df. Careful: do not chose any variables that are dependent on occurrence of count_var (e.g. Histology of second cancer). If y_break_vars = "none", no stratification is performed. Default is "none".

xbreak_var

One variable from df by which SIRs should be stratified as a second dimension in result df. This variable will be added as a second stratification dimension to ybreak_vars and all variables will be calculated for subpopulations of x and y combinations. Careful: do not chose any variables that are dependent on occurrence of count_var (e.g. Year of second cancer). If y_break_vars = "none", no stratification is performed. Default is "none".

futime_breaks

vector that indicates split points for follow-up time groups (in years) that will be used as xbreak_var. Default is c(0, .5, 1, 5, 10, Inf) that will result in 5 groups (up to 6 months, 6-12 months, 1-5 years, 5-10 years, 10+ years). If you don't want to split by follow-up time, use futime_breaks = "none".

count_var

variable to be counted as observed case. Cases are usually the second cancers. Should be 1 for case to be counted.

refrates_df

df where reference rate from general population are defined. It is assumed that refrates_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "incidence_crude_rate" for incidence rate in the respective age/sex/year cohort.The variable "race" is additionally required if the option "race_var" is used. refrates_df must use the same category coding of age, sex, region, year and t_site as age_var, sex_var, region_var, year_var and site_var.

calc_total_row

option to calculate a row of totals. Can be either FALSE for not adding such a row or TRUE for adding it at the first row. Default is TRUE.

calc_total_fu

option to calculate totals for follow-up time. Can be either FALSE for not adding such a column or TRUE for adding. Default is TRUE.

region_var

variable in df that contains information on region where case was incident. Default is set if dattype is given.

age_var

variable in df that contains information on age-group. Default is set if dattype is given.

sex_var

variable in df that contains information on sex. Default is set if dattype is given.

year_var

variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given.

race_var

optional argument, if SIR should be calculated stratified by race. If you want to use this option, provide variable name of df that contains race information. If race_var is provided refrates_df needs to contain the variable "race".

site_var

variable in df that contains information on site or subsite (e.g. ICD code, SEER site code or others that matches t_site in refrates_df) of case diagnosis. Cases are usually the second cancers. Default is set if dattype is given.

futime_var

variable in df that contains follow-up time per person between date of first cancer and any of death, date of event (case), end of FU date (in years; whatever event comes first). Default is set if dattype is given.

expect_missing_refstrata_df

optional argument, if strata with missing refrates are expected, because incidence rates of value 0 are not explicit, but missing from refrates_df. It is assumed that expect_missing_refstrata_df is a data.frame has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), and "t_site" for The variable "race" is additionally required if the option "race_var" is used. refrates_df must use the same category coding of age, sex, region, year and t_site as age_var, sex_var, region_var, year_var and site_var.

alpha

significance level for confidence interval calculations. Default is alpha = 0.05 which will give 95 percent confidence intervals.

quiet

If TRUE, warnings and messages will be suppressed. Default is FALSE.

Examples

#There are various preparation steps required, before you can run this function.
#Please refer to the Introduction vignette to see how to prepare your data
if (FALSE) {
usdata_wide %>%
  sir_byfutime(
        dattype = "seer",
        ybreak_vars = c("race.1", "t_dco.1"),
        xbreak_var = "none",
        futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
        count_var = "count_spc",
        refrates_df = us_refrates_icd2,
        calc_total_row = TRUE,
        calc_total_fu = TRUE,
        region_var = "registry.1",
        age_var = "fc_agegroup.1",
        sex_var = "sex.1",
        year_var = "t_yeardiag.1",
        site_var = "t_site_icd.1", #using grouping by second cancer incidence
        futime_var = "p_futimeyrs",
        alpha = 0.05)
        }