Reshape dataset to wide format - tidyr version

reshape_long_tidyr(wide_df, case_id_var, time_id_var, datsize = Inf)

Arguments

wide_df

dataframe

case_id_var

String with name of ID variable indicating same patient. E.g. idvar="PUBCSNUM" for SEER data.

time_id_var

String with name of variable that indicates diagnosis per patient. E.g. timevar="SEQ_NUM" for SEER data.

datsize

Number of rows to be taken from df. This parameter is mainly for testing. Default is Inf so that df is fully processed.

Value

long_df

Examples


data(us_second_cancer)

#prep step - reshape wide a sample of 10000 rows from us_second_cancer
usdata_wide_sample <- msSPChelpR::reshape_wide(us_second_cancer,
                         case_id_var = "fake_id", 
                         time_id_var = "SEQ_NUM", 
                         timevar_max = 2,
                         datsize = 10000)
#> Long dataset had too many cases per patient. Wide dataset is limited to  2  cases per id as defined in timevar_max option.

#now we can reshape long again
msSPChelpR::reshape_long_tidyr(usdata_wide_sample,
                         case_id_var = "fake_id", 
                         time_id_var = "SEQ_NUM")
#> # A tibble: 8,746 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <dbl> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100034        1 SEER Reg … Male  White 1979-01-01 2000-06-15 C50        hist…
#>  4 100037        1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54        hist…
#>  5 100038        1 SEER Reg … Male  White 1989-01-01 1991-04-15 C50        hist…
#>  6 100038        2 SEER Reg … Male  White 1989-01-01 2000-03-15 C80        hist…
#>  7 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#>  8 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#>  9 100047        1 SEER Reg … Fema… White 1927-01-01 1998-04-15 C50        hist…
#> 10 100047        2 SEER Reg … Fema… White 1927-01-01 2003-10-15 C14        hist…
#> # ℹ 8,736 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>