需要从宽到长转换问题的回答

需要从宽到长转换

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<a href="https://i.stack.imgur.com/lBNGw.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/lBNGw.png" alt="enter image description here"/></a> 你好，我有一个数据集，在A列中有唯一的Id变量，然后对每个病人进行后续的肾脏扫描。这是一个csv文件，我想重塑它的长格式使用R如果可能的话。每个参与者都可以进行1-17次肾脏扫描。你知道吗 还有一些Id被列为“否”，表示不接收扫描。我想把它改造成类似这样的东西 <a href="https://i.stack.imgur.com/XRfmM.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/XRfmM.png" alt="enter image description here"/></a> 我知道以前关于这个问题每年组织，我有来自参与者的扫描，在年日期格式yyyy-mm-dd发生多次 请参阅下面的数据 <pre><code>structure(list(id = c(1010001, 1010002, 1010004, 1010005, 1010006, 1010007), `GFR Scans?` = c("Yes", "Yes", "Yes", "Yes", "Yes", "No"), `1. Date of renal scan:` = structure(c(1133913600, 1196812800, 1237334400, 1124150400, 1192060800, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `1. Type of renal scan:` = c("DTPA", "DTPA", "DTPA", "DTPA", "DTPA", NA), `1. GFR mL/1.73 sq.m` = c(18, 13, 68, 117, 46, NA), `1. Pre/Post tx?` = c("Pre", "Pre", "Post", "Post", "Pre", NA), `2. Date of renal scan:` = structure(c(1146528000, 1214524800, NA, 1151366400, 1245974400, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `2. Type of renal scan:` = c("DTPA", "DTPA", NA, "DTPA", "DTPA", NA), `2. GFR mL/1.73 sq.m` = c(86, 110, NA, 148, 123, NA), `2. Pre/Post tx?` = c("Post", "Post", NA, "Post", "Post", NA), `3. Date of renal scan:` = structure(c(NA, 1219104000, NA, 1184025600, NA, NA), class = c("POSIXct", "POSIXt" ), tzone = "UTC"), `3. Type of renal scan:` = c(NA, "DTPA", NA, "DTPA", NA, NA), `3. GFR mL/1.73 sq.m` = c(NA, 92, NA, 166, NA, NA), `3. Pre/Post tx?` = c(NA, "Post", NA, "Post", NA, NA), `4. Date of renal scan:` = structure(c(NA, 1242691200, NA, 1213660800, NA, NA), class = c("POSIXct", "POSIXt" ), tzone = "UTC"), `4. Type of renal scan:` = c(NA, "DTPA", NA, "DTPA", NA, NA), `4. GFR mL/1.73 sq.m` = c(NA, 36, NA, 171, NA, NA), `4. Pre/Post tx?` = c(NA, "Post", NA, "Post", NA, NA), `5. Date of renal scan:` = structure(c(NA, NA, NA, 1288656000, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `5. Type of renal scan:` = c(NA, NA, NA, "DTPA", NA, NA), `5. GFR mL/1.73 sq.m` = c(NA, NA, NA, 105, NA, NA), `5. Pre/Post tx?` = c(NA, NA, NA, "Post", NA, NA), `6. Date of renal scan:` = structure(c(NA, NA, NA, 1323129600, NA, NA), class = c("POSIXct", "POSIXt" ), tzone = "UTC"), `6. Type of renal scan:` = c(NA, NA, NA, "DTPA", NA, NA), `6. GFR mL/1.73 sq.m` = c(NA, NA, NA, 103, NA, NA), `6. Pre/Post tx?` = c(NA, NA, NA, "Post", NA, NA ), `7. Date of renal scan:` = structure(c(NA, NA, NA, 1355184000, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `7. Type of renal scan:` = c(NA, NA, NA, "DTPA", NA, NA), `7. GFR mL/1.73 sq.m` = c(NA, NA, NA, 98, NA, NA), `7. Pre/Post tx?` = c(NA, NA, NA, "Post", NA, NA), `8. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `8. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `8. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `8. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `9. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `9. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `9. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `9. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `10. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `10. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `10. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `10. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `11. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `11. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `11. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `11. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA ), `12. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `12. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `12. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `12. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `13. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `13. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA ), `13. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `13. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `14. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `14. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `14. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `14. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `15. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `15. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `15. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `15. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `16. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `16. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `16. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `16. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA ), `17. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `17. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `17. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `17. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) </code></pre> 第一个图像是原始的宽格式，第二个图像是我想得到的。关于这个问题，其他从宽到长的答案都没有帮助我，因为我涉及多个专栏。你知道吗 例如，id 1010001已经进行了两次扫描，我需要一个接一个地列出，而不是放在另一个旁边（参见图2）。你知道吗 我非常感谢你的帮助。你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这个问题以前被问过好几次，例如<a href="https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns">Reshaping multiple sets of measurement columns (wide format) into single columns (long format)</a>。一种可能的方法是使用<code>data.table</code>的<code>melt()</code>函数，它能够同时重塑多个值列。你知道吗 然而，这里有一个额外的困难，它证明了一个答案本身，IMHO。列名有时会包含多余的空白，为了使列具有一致的命名模式，需要事先删除这些空白。你知道吗 <pre><code>names(df1) </code></pre> <blockquote> <pre><code> [1] "id" "GFR Scans?" "1. Date of renal scan:" "1. Type of renal scan:" [5] "1. GFR mL/1.73 sq.m" "1. Pre/Post tx?" "2. Date of renal scan:" "2. Type of renal scan:" [9] "2. GFR mL/1.73 sq.m" "2. Pre/Post tx?" "3. Date of renal scan:" "3. Type of renal scan:" [13] "3. GFR mL/1.73 sq.m" "3. Pre/Post tx?" "4. Date of renal scan:" "4. Type of renal scan:" [17] "4. GFR mL/1.73 sq.m" "4. Pre/Post tx?" "5. Date of renal scan:" "5. Type of renal scan:" [21] "5. GFR mL/1.73 sq.m" "5. Pre/Post tx?" "6. Date of renal scan:" "6. Type of renal scan:" [25] "6. GFR mL/1.73 sq.m" "6. Pre/Post tx?" "7. Date of renal scan:" "7. Type of renal scan:" [29] "7. GFR mL/1.73 sq.m" "7. Pre/Post tx?" "8. Date of renal scan:" "8. Type of renal scan:" [33] "8. GFR mL/1.73 sq.m" "8. Pre/Post tx?" "9. Date of renal scan:" "9. Type of renal scan:" [37] "9. GFR mL/1.73 sq.m" "9. Pre/Post tx?" "10. Date of renal scan:" "10. Type of renal scan:" [41] "10. GFR mL/1.73 sq.m" "10. Pre/Post tx?" "11. Date of renal scan:" "11. Type of renal scan:" [45] "11. GFR mL/1.73 sq.m" "11. Pre/Post tx?" "12. Date of renal scan:" "12. Type of renal scan:" [49] "12. GFR mL/1.73 sq.m" "12. Pre/Post tx?" "13. Date of renal scan:" "13. Type of renal scan:" [53] "13. GFR mL/1.73 sq.m" "13. Pre/Post tx?" "14. Date of renal scan:" "14. Type of renal scan:" [57] "14. GFR mL/1.73 sq.m" "14. Pre/Post tx?" "15. Date of renal scan:" "15. Type of renal scan:" [61] "15. GFR mL/1.73 sq.m" "15. Pre/Post tx?" "16. Date of renal scan:" "16. Type of renal scan:" [65] "16. GFR mL/1.73 sq.m" "16. Pre/Post tx?" "17. Date of renal scan:" "17. Type of renal scan:" </code></pre> </blockquote> <pre><code>library(data.table) library(magrittr) # clean up column names: remove surplus whitespace setDT(df1) %>% setnames(names(.) %>% stringr::str_replace_all("\\s+", " ")) # get name pattern for subsequent melt cols <- names(df1)[3:6] %>% stringr::str_replace("1. ", "") # reshape multiple columns from wide to long long <- melt(df1, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)[ # recreate lost POSIXct attribute , `Date of renal scan:` := lubridate::as_datetime(`Date of renal scan:`)][] long </code></pre> <blockquote> <pre><code> id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx? 1: 1010001 Yes 1 2005-12-07 DTPA 18 Pre 2: 1010002 Yes 1 2007-12-05 DTPA 13 Pre 3: 1010004 Yes 1 2009-03-18 DTPA 68 Post 4: 1010005 Yes 1 2005-08-16 DTPA 117 Post 5: 1010006 Yes 1 2007-10-11 DTPA 46 Pre 6: 1010001 Yes 2 2006-05-02 DTPA 86 Post 7: 1010002 Yes 2 2008-06-27 DTPA 110 Post 8: 1010005 Yes 2 2006-06-27 DTPA 148 Post 9: 1010006 Yes 2 2009-06-26 DTPA 123 Post 10: 1010002 Yes 3 2008-08-19 DTPA 92 Post 11: 1010005 Yes 3 2007-07-10 DTPA 166 Post 12: 1010002 Yes 4 2009-05-19 DTPA 36 Post 13: 1010005 Yes 4 2008-06-17 DTPA 171 Post 14: 1010005 Yes 5 2010-11-02 DTPA 105 Post 15: 1010005 Yes 6 2011-12-06 DTPA 103 Post 16: 1010005 Yes 7 2012-12-11 DTPA 98 Post </code></pre> </blockquote> 在对<code>melt()</code>的调用中，我们可以将参数<code>na.rm = FALSE</code>设置为保留all数据： <blockquote> <pre><code> id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx? 1: 1010001 Yes 1 2005-12-07 DTPA 18 Pre 2: 1010002 Yes 1 2007-12-05 DTPA 13 Pre 3: 1010004 Yes 1 2009-03-18 DTPA 68 Post 4: 1010005 Yes 1 2005-08-16 DTPA 117 Post 5: 1010006 Yes 1 2007-10-11 DTPA 46 Pre - 98: 1010002 Yes 17 <NA> <NA> NA <NA> 99: 1010004 Yes 17 <NA> <NA> NA <NA> 100: 1010005 Yes 17 <NA> <NA> NA <NA> 101: 1010006 Yes 17 <NA> <NA> NA <NA> 102: 1010007 No 17 <NA> <NA> NA <NA> </code></pre> </blockquote>

需要从宽到长转换

1 个回答

相关Python问题