<p>这个问题以前被问过好几次,例如<a href="https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns">Reshaping multiple sets of measurement columns (wide format) into single columns (long format)</a>。一种可能的方法是使用<code>data.table</code>的<code>melt()</code>函数,它能够同时重塑多个值列。你知道吗</p>
<p>然而,这里有一个额外的困难,它证明了一个答案本身,IMHO。
列名有时会包含多余的空白,为了使列具有一致的命名模式,需要事先删除这些空白。你知道吗</p>
<pre><code>names(df1)
</code></pre>
<blockquote>
<pre><code> [1] "id" "GFR Scans?" "1. Date of renal scan:" "1. Type of renal scan:"
[5] "1. GFR mL/1.73 sq.m" "1. Pre/Post tx?" "2. Date of renal scan:" "2. Type of renal scan:"
[9] "2. GFR mL/1.73 sq.m" "2. Pre/Post tx?" "3. Date of renal scan:" "3. Type of renal scan:"
[13] "3. GFR mL/1.73 sq.m" "3. Pre/Post tx?" "4. Date of renal scan:" "4. Type of renal scan:"
[17] "4. GFR mL/1.73 sq.m" "4. Pre/Post tx?" "5. Date of renal scan:" "5. Type of renal scan:"
[21] "5. GFR mL/1.73 sq.m" "5. Pre/Post tx?" "6. Date of renal scan:" "6. Type of renal scan:"
[25] "6. GFR mL/1.73 sq.m" "6. Pre/Post tx?" "7. Date of renal scan:" "7. Type of renal scan:"
[29] "7. GFR mL/1.73 sq.m" "7. Pre/Post tx?" "8. Date of renal scan:" "8. Type of renal scan:"
[33] "8. GFR mL/1.73 sq.m" "8. Pre/Post tx?" "9. Date of renal scan:" "9. Type of renal scan:"
[37] "9. GFR mL/1.73 sq.m" "9. Pre/Post tx?" "10. Date of renal scan:" "10. Type of renal scan:"
[41] "10. GFR mL/1.73 sq.m" "10. Pre/Post tx?" "11. Date of renal scan:" "11. Type of renal scan:"
[45] "11. GFR mL/1.73 sq.m" "11. Pre/Post tx?" "12. Date of renal scan:" "12. Type of renal scan:"
[49] "12. GFR mL/1.73 sq.m" "12. Pre/Post tx?" "13. Date of renal scan:" "13. Type of renal scan:"
[53] "13. GFR mL/1.73 sq.m" "13. Pre/Post tx?" "14. Date of renal scan:" "14. Type of renal scan:"
[57] "14. GFR mL/1.73 sq.m" "14. Pre/Post tx?" "15. Date of renal scan:" "15. Type of renal scan:"
[61] "15. GFR mL/1.73 sq.m" "15. Pre/Post tx?" "16. Date of renal scan:" "16. Type of renal scan:"
[65] "16. GFR mL/1.73 sq.m" "16. Pre/Post tx?" "17. Date of renal scan:" "17. Type of renal scan:"
</code></pre>
</blockquote>
<pre><code>library(data.table)
library(magrittr)
# clean up column names: remove surplus whitespace
setDT(df1) %>% setnames(names(.) %>% stringr::str_replace_all("\\s+", " "))
# get name pattern for subsequent melt
cols <- names(df1)[3:6] %>% stringr::str_replace("1. ", "")
# reshape multiple columns from wide to long
long <- melt(df1, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)[
# recreate lost POSIXct attribute
, `Date of renal scan:` := lubridate::as_datetime(`Date of renal scan:`)][]
long
</code></pre>
<blockquote>
<pre><code> id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx?
1: 1010001 Yes 1 2005-12-07 DTPA 18 Pre
2: 1010002 Yes 1 2007-12-05 DTPA 13 Pre
3: 1010004 Yes 1 2009-03-18 DTPA 68 Post
4: 1010005 Yes 1 2005-08-16 DTPA 117 Post
5: 1010006 Yes 1 2007-10-11 DTPA 46 Pre
6: 1010001 Yes 2 2006-05-02 DTPA 86 Post
7: 1010002 Yes 2 2008-06-27 DTPA 110 Post
8: 1010005 Yes 2 2006-06-27 DTPA 148 Post
9: 1010006 Yes 2 2009-06-26 DTPA 123 Post
10: 1010002 Yes 3 2008-08-19 DTPA 92 Post
11: 1010005 Yes 3 2007-07-10 DTPA 166 Post
12: 1010002 Yes 4 2009-05-19 DTPA 36 Post
13: 1010005 Yes 4 2008-06-17 DTPA 171 Post
14: 1010005 Yes 5 2010-11-02 DTPA 105 Post
15: 1010005 Yes 6 2011-12-06 DTPA 103 Post
16: 1010005 Yes 7 2012-12-11 DTPA 98 Post
</code></pre>
</blockquote>
<p>在对<code>melt()</code>的调用中,我们可以将参数<code>na.rm = FALSE</code>设置为保留<em>all</em>数据:</p>
<blockquote>
<pre><code> id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx?
1: 1010001 Yes 1 2005-12-07 DTPA 18 Pre
2: 1010002 Yes 1 2007-12-05 DTPA 13 Pre
3: 1010004 Yes 1 2009-03-18 DTPA 68 Post
4: 1010005 Yes 1 2005-08-16 DTPA 117 Post
5: 1010006 Yes 1 2007-10-11 DTPA 46 Pre
-
98: 1010002 Yes 17 <NA> <NA> NA <NA>
99: 1010004 Yes 17 <NA> <NA> NA <NA>
100: 1010005 Yes 17 <NA> <NA> NA <NA>
101: 1010006 Yes 17 <NA> <NA> NA <NA>
102: 1010007 No 17 <NA> <NA> NA <NA>
</code></pre>
</blockquote>