Renumber the time ID per case (i.e. Tumor sequence)

renumber_time_id(
  df,
  new_time_id_var,
  dattype = NULL,
  case_id_var = NULL,
  time_id_var = NULL,
  diagdat_var = NULL,
  timevar_max = Inf
)

Arguments

df

dataframe

new_time_id_var

Name of the newly calculated variable for time_id. Required.

dattype

can be "zfkd" or "seer" or NULL. Will set default variable names if dattype is "seer" or "zfkd". Default is NULL.

case_id_var

String with name of ID variable indicating same patient. E.g. case_id_var="PUBCSNUM" for SEER data.

time_id_var

String with name of variable that indicates diagnosis per patient. E.g. time_id_var="SEQ_NUM" for SEER data.

diagdat_var

String with name of variable that indicates date of diagnosis per event. E.g. diagdat_var="t_datediag" for SEER data.

timevar_max

Numeric; default Inf. Maximum number of cases per id. All tumors > timevar_max will be deleted.

Value

df

Examples


data(us_second_cancer)
us_second_cancer %>%
 #only select first 10000 rows so example runs faster
 dplyr::slice(1:10000) %>%
 msSPChelpR::renumber_time_id(new_time_id_var = "t_tumid",
                             dattype = "seer",
                             case_id_var = "fake_id")
#> # A tibble: 10,000 × 17
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100034        1 SEER Reg … Male  White 1979-01-01 2000-06-15 C50        hist…
#>  6 100037        1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54        hist…
#>  7 100038        1 SEER Reg … Male  White 1989-01-01 1991-04-15 C50        hist…
#>  8 100038        2 SEER Reg … Male  White 1989-01-01 2000-03-15 C80        hist…
#>  9 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#> 10 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#> # ℹ 9,990 more rows
#> # ℹ 8 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>,
#> #   t_tumid <int>