Reshape dataset to wide format - tidyr version

reshape_wide_tidyr(
  df,
  case_id_var,
  time_id_var,
  timevar_max = 6,
  datsize = Inf
)

Arguments

df

dataframe

case_id_var

String with name of ID variable indicating same patient. E.g. idvar="PUBCSNUM" for SEER data.

time_id_var

String with name of variable that indicates diagnosis per patient. E.g. timevar="SEQ_NUM" for SEER data.

timevar_max

Numeric; default 6. Maximum number of cases per id. All tumors > timevar_max will be deleted before reshaping.

datsize

Number of rows to be taken from df. This parameter is mainly for testing. Default is Inf so that df is fully processed.

Value

df

Examples


data(us_second_cancer)

msSPChelpR::reshape_wide_tidyr(us_second_cancer,
                         case_id_var = "fake_id", 
                         time_id_var = "SEQ_NUM", 
                         timevar_max = 2,
                         datsize = 10000)
#> Long dataset had too many cases per patient. Wide dataset is limited to  2  cases per id as defined in timevar_max option.
#> # A tibble: 6,003 × 29
#>    fake_id registry.1 sex.1 race.1 datebirth.1 t_datediag.1 t_site_icd.1 t_dco.1
#>    <chr>   <chr>      <chr> <chr>  <date>      <date>       <chr>        <chr>  
#>  1 100004  SEER Reg … Male  White  1926-01-01  1992-07-15   C50          histol…
#>  2 100034  SEER Reg … Male  White  1979-01-01  2000-06-15   C50          histol…
#>  3 100037  SEER Reg … Fema… White  1938-01-01  1996-01-15   C54          histol…
#>  4 100038  SEER Reg … Male  White  1989-01-01  1991-04-15   C50          histol…
#>  5 100039  SEER Reg … Fema… White  1946-01-01  2003-08-15   C50          histol…
#>  6 100047  SEER Reg … Fema… White  1927-01-01  1998-04-15   C50          histol…
#>  7 100057  SEER Reg … Male  Black  1961-01-01  2010-04-15   C18          histol…
#>  8 100060  SEER Reg … Fema… White  1947-01-01  2003-08-15   C50          histol…
#>  9 100063  SEER Reg … Fema… Black  1938-01-01  1995-12-15   C50          histol…
#> 10 100073  SEER Reg … Male  White  1960-01-01  1993-11-15   C44          histol…
#> # ℹ 5,993 more rows
#> # ℹ 21 more variables: t_hist.1 <int>, fc_age.1 <int>, datedeath.1 <date>,
#> #   p_alive.1 <chr>, p_dodmin.1 <date>, fc_agegroup.1 <chr>,
#> #   t_yeardiag.1 <chr>, registry.2 <chr>, sex.2 <chr>, race.2 <chr>,
#> #   datebirth.2 <date>, t_datediag.2 <date>, t_site_icd.2 <chr>, t_dco.2 <chr>,
#> #   t_hist.2 <int>, fc_age.2 <int>, datedeath.2 <date>, p_alive.2 <chr>,
#> #   p_dodmin.2 <date>, fc_agegroup.2 <chr>, t_yeardiag.2 <chr>