Reshape dataset to wide format - tidyr version
reshape_long_tidyr(wide_df, case_id_var, time_id_var, datsize = Inf)
dataframe
String with name of ID variable indicating same patient.
E.g. idvar="PUBCSNUM"
for SEER data.
String with name of variable that indicates diagnosis per patient.
E.g. timevar="SEQ_NUM"
for SEER data.
Number of rows to be taken from df. This parameter is mainly for testing. Default is Inf so that df is fully processed.
long_df
data(us_second_cancer)
#prep step - reshape wide a sample of 10000 rows from us_second_cancer
usdata_wide_sample <- msSPChelpR::reshape_wide(us_second_cancer,
case_id_var = "fake_id",
time_id_var = "SEQ_NUM",
timevar_max = 2,
datsize = 10000)
#> Long dataset had too many cases per patient. Wide dataset is limited to 2 cases per id as defined in timevar_max option.
#now we can reshape long again
msSPChelpR::reshape_long_tidyr(usdata_wide_sample,
case_id_var = "fake_id",
time_id_var = "SEQ_NUM")
#> # A tibble: 8,746 × 16
#> fake_id SEQ_NUM registry sex race datebirth t_datediag t_site_icd t_dco
#> <chr> <dbl> <chr> <chr> <chr> <date> <date> <chr> <chr>
#> 1 100004 1 SEER Reg … Male White 1926-01-01 1992-07-15 C50 hist…
#> 2 100004 2 SEER Reg … Male White 1926-01-01 2004-01-15 C54 hist…
#> 3 100034 1 SEER Reg … Male White 1979-01-01 2000-06-15 C50 hist…
#> 4 100037 1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54 hist…
#> 5 100038 1 SEER Reg … Male White 1989-01-01 1991-04-15 C50 hist…
#> 6 100038 2 SEER Reg … Male White 1989-01-01 2000-03-15 C80 hist…
#> 7 100039 1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50 hist…
#> 8 100039 2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34 hist…
#> 9 100047 1 SEER Reg … Fema… White 1927-01-01 1998-04-15 C50 hist…
#> 10 100047 2 SEER Reg … Fema… White 1927-01-01 2003-10-15 C14 hist…
#> # ℹ 8,736 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> # p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>