New Features

  • new function histgroup_iarc() to create variable for groups of malignant neoplasms considered to be histologically ‘different’ for the purpose of defining multiple tumors, ICD-O-3 (see #100)
  • some functions gain new quiet argument to suppress rlang::warn() and rlang::inform() messages. You can use this when you have checked your results for correctness and want to reduce message output, but keep the progress bars.
  • asir(): add World Standard Population 2000-2025 for function with option std_pop=="WHO2000" as described here: https://seer.cancer.gov/stdpopulations/world.who.html
  • sir_byfutime() gains new argument expect_missing_refstrata_df. You can define another dataframe that contains strata expected to be missing from refrates_df (because they are not explicitly coded with incidence = 0). This can be helpful, if refrates_df has a lot of strata and 0 incidence strata have been removed to save storage space. Internally, the rows of expect_missing_refstrata_df will be appended to refrates_df. This reduces the number of lines reported in attribute problems_missing_ref_strata. Default setting is expect_missing_refstrata_df = NULL.
  • sample data set for data("us_second_cancer") gains new variable t_hist on histology, i.e. ICD-O-3-Code on tumor morphology (4 digits)

Breaking Changes

  • no breaking changes in this version

Bug fixes

Internal

  • replace progress bars by cli
  • deprecate verb.()syntax from tidytable (Closes #94)

New Features

  • new function calc_refrates() to calculate age-, sex-, region-, year-specific reference rates from a long format dataframe with cancer cases that are counted for incident cases and then matched with a reference population. The resulting reference rates dataframe can directly be used with sir_byfutime() function.
  • functions gain new default dattype = NULL and thus are more flexible to take other source data types (Closes #73)

Breaking Changes

  • functions asir, calc_futime*, calc_refrates, ir_crosstab_byfutime, pat_status*, renumber_time_id*, and sir_byfutime now by default are set to dattype = NULL. If you relied on automatic variable naming feature, you need to add dattype = "seer"or dattype = "zfkd" to your function call.
  • fix typo in attribute names: attributes are now correctly named problems_missing_count_strata and problems_missing_fu_strata (Closes #80)

Bug fixes

  • sir_byfutime():
    • attributes with notes and problems are now correctly saved to results_df

Internal

  • deprecated functions from tidytable package have been replaced (Closes #71 and #74)

New Features

  • new function sir_ratio() and related sir_ratio_lci() and sir_ratio_uci() to calculate ratio of two SIRs/SMRs to get relative risk and confidence limits for this ratio.
  • tidytable variant of reshape_long function, i.e. reshape_long_tt() ⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant.
  • summarize_sir_results():
    • add ability to summarize by different site_var than the one used in sir_byfutime()

Bug fixes

  • summarize_sir_results():
    • PYARs are now correctly calculated when using summarize_site == TRUE. Previously the results incorrectly counted each site multiple times. (Closes #62)
  • pat_status():
    • update default values for dattype = "zfkd"

Internal

  • add R-CMD-Check to github actions

New Features

  • new sample data set for standard populations ⇒ data("standard_population")
  • new sample data set for us population ⇒ data("population_us") (Closes #58)

Bug fixes

Other changes

  • add examples to function documentation (Closes #56)
  • remove “R” from package title (Closes #57)
  • update package description (Closes #54)
  • update introduction vignette vignette("introduction")

New Features

Breaking Changes

  • sir_byfutime():
    • options add_total_row and add_total_fu are replaced by calc_total_row and calc_total_fu. These are logical parameters now. The positioning of total rows and columns is completely handled by the summarize_sir_results() function now. There total rows can be set to top and bottom and total columns to left and right.
    • option expcount_src including related parameters stdpop_df, refpop_df, std_pop, truncate_std_pop and pyar_var have been removed. Function sir_byfutime() will only work calculating expected counts based on reference rates, not within the cohort of the dataset. To calculate expected based on the cohort, a new function create_refrates will be added in the future. (#41)
    • option collapse_ci has been removed and added to summarize_sir_results() instead.
    • option name for tumor site variable changed from icdcat_var to site_var
    • option name for age/age group variable changed from agegroup_var to age_var
    • in total the parameters expcount_src, futime_src, stdpop_df, refpop_df, std_pop, truncate_std_pop, pyar_var, icdcat_var, collapse_ci have been removed to simply the function ⇒ make sure you remove these arguments from your sir_byfutime() function calls.
  • sir():
    • is superseded by the use of sir_byfutime(). To migrate your former sir() functions, you can simply use sir_byfutime(, futime_breaks = "none") that will yield the same results.
  • summarize_sir_results():
    • option name for tumor site variable changed from summarize_icdcat to summarize_site
  • reshape_long_tidyr():
    • option var_selection is deprecated. Please select variables before running the reshape_long_* functions.
  • asir():
    • option name for age/age group variable changed from agegroup_var to age_var
    • option name for tumor site variable changed from icdcat_var to site_var
  • pat_status(), pat_status_tt(), vital_status(), and vital_status_tt():
    • Capitalized default variable labelling.
    • This might break code that relied on using the labels coming out of these functions in later filter or mutate functions.
  • ir_crosstab_byfutime():
    • option futime_breaks now uses breaks in years instead of months as previously.
    • default futime_var is now follow-up time in years
  • now requires dplyr version 1.0.0
  • now requires tidytable package
  • the default option name for tumor site variable changed from icdcat_var to site_var. This need manual update of function calls of sir_byfutime() and asir(), if option is specified.
  • the default variable name for tumor site in all functions has been changed from t_icdcat to t_site. So the reference data frames used will need to have a t_site column.
  • the data.table variants of functions (renumber_time_id_dt(), pat_status_dt(), reshape_long_dt(), reshape_wide_dt(), vital_status_dt()) have been removed for simplicity, please use tidytable variants, i.e. reshape_wide_tt(), renumber_time_id_tt(), pat_status_tt(), vital_status_tt(), calc_futime_tt(), instead. They will give the same data.table output and same performance.

Bug Fixes

  • implement new reliable routine to split df when reshape_wide() with option chunks is used. Closes #1.
  • Sorting of columns in wide datasets by reshape_wide_tidyr() and reshape_wide_tt() is now preserved. Closes #31.
  • ensure sorting in renumer_time_id() and make sure that new_time_id_var is returned as integer.
  • fix bug in pat_status_*(., check = TRUE)option
  • improve internal tests in sir_byfutime() so that PYARs do not get lost before running summary function
  • sir_byfutime() now also gives correct results if range of futime_breaks is not 0-Inf but smaller

New Features

Bug Fixes

New Features

  • new faster version of reshape_long based on data.table
  • start new vignette on workflow from filtered long dataset to follow-up times vignette("patstatus_futime")

Bug Fixes

  • implement new tidyselect routine using tidyselect::all_of for vector-based variable selection
  • implement correct referencing in vital_status_dt and pat_status_dt
  • add exports from data.table
  • update documentation for sir and sir_byfutime functions
  • make reshape_long function work

New Features

  • new faster version of vital_status function using data.table
  • new faster version of pat_status function using data.table

New Features

  • new faster version of reshape_wide_dt function based on data.table and without problematic slices done by reshape_wide
  • new faster version of renumber_time_id function based on data.table

New Features

  • new function renumber_time_id

Bug Fixes

  • add check to revert status_var to numeric in case it was created with option as_labelled_factor
  • fix label bug in life_var_new
  • add option as_labelled_factor to vital_status function
  • fix newly introduced error in vital_status function
  • fix error in vital_status function by replacing sjlabelled::get_label function
  • fix error in pat_status and vital_status functions due to change in sjlabelled package
  • rebuild description file and manual
  • remove nest_legacy functions and use new tidyr syntax, close #19
  • make summarize_sir_results function work without break variables
  • for function sir_byfutime ⇒ make option add_total_row work, even if option ybreak_vars = "none"
  • Make use of time_id_var and case_id_var use coherent across reshape functions
  • Fixed issue in Namespace
  • Added a NEWS.md file to track changes to the package.
  • add option futime_breaks = "none" to sir_byfutime function
  • includes a new function to calculate crude (absolute) incidence rates a tabulate them by whatever number of grouping variables and it can be used as a Table 1 for publications ⇒ The function is called msSPChelpR::ir_crosstab
  • includes a new function to calculate SIRs (standardized incidence ratios) by whatever strata you desire (unlimited ybreak_vars; one xbreak_var) and additionally customized breaks for follow-up times (default is: to 6 months, .5-1 year, 1-5 years, 5-10 years, >10 years) ⇒ attention, it only makes sense to stratify results (ybreak_vars or xbreak_var) by variables measured at baseline and not for variables that are dependent on the occurrence of an SPC) ⇒ function msSPChelpR::sir_byfutime ⇒ depending on the number of stratification variables you are using, this function may result in a very long results data.frame. So please use it together with the new function msSPChelpR::summarize_sir_results
  • includes a new function to summarize results dataframes from SIR calculations
  • New reshape functions that are faster and are using less memory