Calculate age-, sex-, cohort-, region-specific incidence rates from a cohort

calc_refrates(
  df,
  dattype = NULL,
  count_var,
  refpop_df,
  calc_totals = FALSE,
  fill_sites = "no",
  region_var = NULL,
  age_var = NULL,
  sex_var = NULL,
  year_var = NULL,
  race_var = NULL,
  site_var = NULL,
  quiet = FALSE
)

Arguments

df

dataframe in long format

dattype

can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.

count_var

variable to be counted as observed case. Should be 1 for case to be counted.

refpop_df

df where reference population data is defined. Only required if option futime = "refpop" is chosen. It is assumed that refpop_df has the columns "region" for region, "sex" for biological sex, "age" for age-groups (can be single ages or 5-year brackets), "year" for time period (can be single year or 5-year brackets), "population_pyar" for person-years at risk in the respective age/sex/year cohort. refpop_df must use the same category coding of age, sex, region, year and site as age_var, sex_var, region_var, year_var and site_var.

calc_totals

option to calculate totals for all age-groups, all sexes, all years, all races, all sites. Default is FALSE.

fill_sites

option to fill missing sites in observed with incidence rate of 0. Needs to define the coding system used. Can be either "no" for not filling missing sites. "icd2d" for ICD-O-3 2 digit (C00-C80), "icd3d" for ICD-O-3 3digit, "icd10gm2d" for ICD-10-GM 2-digit (C00-C97), "sitewho" for Site SEER WHO coding (no 1-89 categories), "sitewho_b" for Site SEER WHO B recoding (no. 1-111 categories), "sitewho_epi" for SITE SEER WHO coding with additional sums, "sitewhogen" for SITE WHO coding with less categories to make compatible for international rates, "sitewho_num" for numeric coding of Site SEER WHO coding (no 1-89 categories), "sitewho_b_num" for numeric coding of Site SEER WHO B recoding (no. 1-111 categories), "sitewhogen_num" for numeric international rates, c("manual", char_vector) of sites manually defined

region_var

variable in df that contains information on region where case was incident. Default is set if dattype is given.

age_var

variable in df that contains information on age-group. Default is set if dattype is given.

sex_var

variable in df that contains information on sex. Default is set if dattype is given.

year_var

variable in df that contains information on year or year-period when case was incident. Default is set if dattype is given.

race_var

optional argument, if rates should be calculated stratified by race. If you want to use this option, provide variable name of df that contains race information. If race_var is provided refpop_df needs to contain the variable "race".

site_var

variable in df that contains information on ICD code of case diagnosis. Cases are usually the second cancers. Default is set if dattype is given.

quiet

If TRUE, warnings and messages will be suppressed. Default is FALSE.

Value

df

Examples

#load sample data
data("us_second_cancer")
data("population_us")

us_second_cancer %>%
  #create variable to indicate to be counted as case
  dplyr::mutate(is_case = 1) %>%
  #calculate refrates - warning: these are not realistic numbers, just showing functionality
  calc_refrates(dattype = "seer", , count_var = "is_case", refpop_df = population_us,
               region_var = "registry", age_var = "fc_agegroup", sex_var = "sex", 
               site_var = "t_site_icd")
#> [INFO Reference Population Missing] For some strata no population can be found.
#>  144 strata have no reference population in `refpop_df`
#>  - Solution could be to add these strata to `refpop_df`.
#> ! Check attribute `problems_missing_refpop_strata` of results to see what strata are affected.
#>  
#> # A tidytable: 6,181 × 9
#>    t_site region          year  sex   age   incidence_cases incidence_crude_rate
#>    <chr>  <fct>           <chr> <fct> <chr>           <dbl>                <dbl>
#>  1 C14    SEER Reg 01 - … 1990… Fema… 00 -…               1                0.145
#>  2 C14    SEER Reg 01 - … 1990… Fema… 25 -…               1                0.118
#>  3 C14    SEER Reg 01 - … 1990… Fema… 35 -…               2                0.231
#>  4 C14    SEER Reg 01 - … 1990… Fema… 45 -…               3                0.486
#>  5 C14    SEER Reg 01 - … 1990… Fema… 50 -…               1                0.208
#>  6 C14    SEER Reg 01 - … 1990… Fema… 60 -…               2                0.518
#>  7 C14    SEER Reg 01 - … 1990… Fema… 70 -…               1                0.303
#>  8 C14    SEER Reg 01 - … 1990… Fema… 75 -…               1                0.381
#>  9 C14    SEER Reg 01 - … 1990… Male  00 -…               1                0.138
#> 10 C14    SEER Reg 01 - … 1990… Male  25 -…               1                0.113
#> # ℹ 6,171 more rows
#> # ℹ 2 more variables: population_pyar <dbl>, population_n_per_year <dbl>