Changelog

New Features

new function histgroup_iarc() to create variable for groups of malignant neoplasms considered to be histologically ‘different’ for the purpose of defining multiple tumors, ICD-O-3 (see #100)
some functions gain new quiet argument to suppress rlang::warn() and rlang::inform() messages. You can use this when you have checked your results for correctness and want to reduce message output, but keep the progress bars.
asir(): add World Standard Population 2000-2025 for function with option std_pop=="WHO2000" as described here: https://seer.cancer.gov/stdpopulations/world.who.html
sir_byfutime() gains new argument expect_missing_refstrata_df. You can define another dataframe that contains strata expected to be missing from refrates_df (because they are not explicitly coded with incidence = 0). This can be helpful, if refrates_df has a lot of strata and 0 incidence strata have been removed to save storage space. Internally, the rows of expect_missing_refstrata_df will be appended to refrates_df. This reduces the number of lines reported in attribute problems_missing_ref_strata. Default setting is expect_missing_refstrata_df = NULL.
sample data set for data("us_second_cancer") gains new variable t_hist on histology, i.e. ICD-O-3-Code on tumor morphology (4 digits)

Breaking Changes

no breaking changes in this version

Bug fixes

make calc_refrates() more robust for missing race_var (Closes #89)
fix bug in calc_refrates() using calc_totals == TRUE (Closes #90)
fix bug in calc_refrates() using numeric versions of fill_sites (Closes #92)
fix bug in asir() that throws error for variable not needed (Closes #95)

Internal

replace progress bars by cli
deprecate verb.()syntax from tidytable (Closes #94)

New Features

new function calc_refrates() to calculate age-, sex-, region-, year-specific reference rates from a long format dataframe with cancer cases that are counted for incident cases and then matched with a reference population. The resulting reference rates dataframe can directly be used with sir_byfutime() function.
functions gain new default dattype = NULL and thus are more flexible to take other source data types (Closes #73)

Breaking Changes

functions asir, calc_futime*, calc_refrates, ir_crosstab_byfutime, pat_status*, renumber_time_id*, and sir_byfutime now by default are set to dattype = NULL. If you relied on automatic variable naming feature, you need to add dattype = "seer"or dattype = "zfkd" to your function call.
fix typo in attribute names: attributes are now correctly named problems_missing_count_strata and problems_missing_fu_strata (Closes #80)

Bug fixes

sir_byfutime():
- attributes with notes and problems are now correctly saved to results_df

Internal

deprecated functions from tidytable package have been replaced (Closes #71 and #74)

New Features

new function sir_ratio() and related sir_ratio_lci() and sir_ratio_uci() to calculate ratio of two SIRs/SMRs to get relative risk and confidence limits for this ratio.
tidytable variant of reshape_long function, i.e. reshape_long_tt() ⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant.
summarize_sir_results():
- add ability to summarize by different site_var than the one used in sir_byfutime()

Bug fixes

summarize_sir_results():
- PYARs are now correctly calculated when using summarize_site == TRUE. Previously the results incorrectly counted each site multiple times. (Closes #62)
pat_status():
- update default values for dattype = "zfkd"

Internal

add R-CMD-Check to github actions

New Features

new sample data set for standard populations ⇒ data("standard_population")
new sample data set for us population ⇒ data("population_us") (Closes #58)

Bug fixes

sir_byfutime(): change output of integer columns to numeric to fix bug in summarize_sir_results() (Closes #59)

Other changes

add examples to function documentation (Closes #56)
remove “R” from package title (Closes #57)
update package description (Closes #54)
update introduction vignette vignette("introduction")

New Features

tidytable variants of functions, i.e. reshape_wide_tt(), renumber_time_id_tt(), pat_status_tt(), vital_status_tt(), calc_futime_tt() ⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant.
sir_byfutime():
is much faster using tidytable package
gained the option race_var to optionally stratify SIR calculations by race.
summarize_sir_results():
new function that increases functionality in summarizing results from sir_byfutime() function
new option to define custom site_var_name
new package website https://marianschmidt.github.io/msSPChelpR
new sample datasets included in the package to demonstrate examples (#36)

Breaking Changes

sir_byfutime():
- options add_total_row and add_total_fu are replaced by calc_total_row and calc_total_fu. These are logical parameters now. The positioning of total rows and columns is completely handled by the summarize_sir_results() function now. There total rows can be set to top and bottom and total columns to left and right.
- option expcount_src including related parameters stdpop_df, refpop_df, std_pop, truncate_std_pop and pyar_var have been removed. Function sir_byfutime() will only work calculating expected counts based on reference rates, not within the cohort of the dataset. To calculate expected based on the cohort, a new function create_refrates will be added in the future. (#41)
- option collapse_ci has been removed and added to summarize_sir_results() instead.
- option name for tumor site variable changed from icdcat_var to site_var
- option name for age/age group variable changed from agegroup_var to age_var
- in total the parameters expcount_src, futime_src, stdpop_df, refpop_df, std_pop, truncate_std_pop, pyar_var, icdcat_var, collapse_ci have been removed to simply the function ⇒ make sure you remove these arguments from your sir_byfutime() function calls.
sir():
- is superseded by the use of sir_byfutime(). To migrate your former sir() functions, you can simply use sir_byfutime(, futime_breaks = "none") that will yield the same results.
summarize_sir_results():
- option name for tumor site variable changed from summarize_icdcat to summarize_site
reshape_long_tidyr():
- option var_selection is deprecated. Please select variables before running the reshape_long_* functions.
asir():
- option name for age/age group variable changed from agegroup_var to age_var
- option name for tumor site variable changed from icdcat_var to site_var
pat_status(), pat_status_tt(), vital_status(), and vital_status_tt():
- Capitalized default variable labelling.
- This might break code that relied on using the labels coming out of these functions in later filter or mutate functions.
ir_crosstab_byfutime():
- option futime_breaks now uses breaks in years instead of months as previously.
- default futime_var is now follow-up time in years
now requires dplyr version 1.0.0
now requires tidytable package
the default option name for tumor site variable changed from icdcat_var to site_var. This need manual update of function calls of sir_byfutime() and asir(), if option is specified.
the default variable name for tumor site in all functions has been changed from t_icdcat to t_site. So the reference data frames used will need to have a t_site column.
the data.table variants of functions (renumber_time_id_dt(), pat_status_dt(), reshape_long_dt(), reshape_wide_dt(), vital_status_dt()) have been removed for simplicity, please use tidytable variants, i.e. reshape_wide_tt(), renumber_time_id_tt(), pat_status_tt(), vital_status_tt(), calc_futime_tt(), instead. They will give the same data.table output and same performance.

Bug Fixes

implement new reliable routine to split df when reshape_wide() with option chunks is used. Closes #1.
Sorting of columns in wide datasets by reshape_wide_tidyr() and reshape_wide_tt() is now preserved. Closes #31.
ensure sorting in renumer_time_id() and make sure that new_time_id_var is returned as integer.
fix bug in pat_status_*(., check = TRUE)option
improve internal tests in sir_byfutime() so that PYARs do not get lost before running summary function
sir_byfutime() now also gives correct results if range of futime_breaks is not 0-Inf but smaller

New Features

add timevar_max option to renumber_time_id() function; use sorting by date of diagnosis instead of old time_id_var
various improvements to reshape_wide_tidyr() function
various improvements to reshape_wide_dt() function which is much faster now and uses data.table::dcast instead of stats::reshape now
various improvements to pat_status() and pat_status_dt() functions
option summarize_icdcat in summarize_sir_results() is now functional
update vignette vignette("introduction")

Bug Fixes

fix incomplete check for required variables in pat_status() and pat_status_dt() functions
fix error in check for required variables in renumber_time_id() that broke functions
fix bug in check for end of FU time in pat_status() and calc_futime()
implement new tidyselect routine using tidyselect::all_of in summarize_sir_results()

New Features

new faster version of reshape_long based on data.table
start new vignette on workflow from filtered long dataset to follow-up times vignette("patstatus_futime")

Bug Fixes

implement new tidyselect routine using tidyselect::all_of for vector-based variable selection
implement correct referencing in vital_status_dt and pat_status_dt
add exports from data.table
update documentation for sir and sir_byfutime functions
make reshape_long function work

New Features

new faster version of vital_status function using data.table
new faster version of pat_status function using data.table

New Features

new faster version of reshape_wide_dt function based on data.table and without problematic slices done by reshape_wide
new faster version of renumber_time_id function based on data.table

New Features

new function renumber_time_id

Bug Fixes

add check to revert status_var to numeric in case it was created with option as_labelled_factor
fix label bug in life_var_new

add option as_labelled_factor to vital_status function
fix newly introduced error in vital_status function

fix error in vital_status function by replacing sjlabelled::get_label function

fix error in pat_status and vital_status functions due to change in sjlabelled package

rebuild description file and manual

remove nest_legacy functions and use new tidyr syntax, close #19

make summarize_sir_results function work without break variables

for function sir_byfutime ⇒ make option add_total_row work, even if option ybreak_vars = "none"

Make use of time_id_var and case_id_var use coherent across reshape functions

Fixed issue in Namespace

Added a NEWS.md file to track changes to the package.

add option futime_breaks = "none" to sir_byfutime function

includes a new function to calculate crude (absolute) incidence rates a tabulate them by whatever number of grouping variables and it can be used as a Table 1 for publications ⇒ The function is called msSPChelpR::ir_crosstab
includes a new function to calculate SIRs (standardized incidence ratios) by whatever strata you desire (unlimited ybreak_vars; one xbreak_var) and additionally customized breaks for follow-up times (default is: to 6 months, .5-1 year, 1-5 years, 5-10 years, >10 years) ⇒ attention, it only makes sense to stratify results (ybreak_vars or xbreak_var) by variables measured at baseline and not for variables that are dependent on the occurrence of an SPC) ⇒ function msSPChelpR::sir_byfutime ⇒ depending on the number of stratification variables you are using, this function may result in a very long results data.frame. So please use it together with the new function msSPChelpR::summarize_sir_results
includes a new function to summarize results dataframes from SIR calculations
New reshape functions that are faster and are using less memory

msSPChelpR (development version)

msSPChelpR 0.9.12024-01-23

New Features

Breaking Changes

Bug fixes

Internal

msSPChelpR 0.9.02022-06-10

New Features

Breaking Changes

Bug fixes

Internal

msSPChelpR 0.8.72021-07-01

New Features

Bug fixes

Internal

msSPChelpR 0.8.62020-11-04

New Features

Bug fixes

Other changes

msSPChelpR 0.8.5 - 2020-09-28

New Features

Breaking Changes

Bug Fixes

msSPChelpR 0.8.4 - 2020-05-21

New Features

Bug Fixes

msSPChelpR 0.8.3

New Features

Bug Fixes

msSPChelpR 0.8.2

msSPChelpR 0.8.1

New Features

msSPChelpR 0.8.0

New Features

msSPChelpR 0.7.4

New Features

msSPChelpR 0.7.3

Bug Fixes

msSPChelpR 0.7.2

msSPChelpR 0.7.1

msSPChelpR 0.7.0

msSPChelpR 0.6.10

msSPChelpR 0.6.9

msSPChelpR 0.6.8

msSPChelpR 0.6.7

msSPChelpR 0.6.6

msSPChelpR 0.6.5

msSPChelpR 0.6.4

msSPChelpR 0.6.3

major changes in msSPChelpR 0.6.0