<p>这里有一种在R中实现的方法,它要求所有数据块的字段(顺序和名称)都是相同的,并且数据块用空行分隔。我想有更简单的方法来实现这一点,也许使用<code>plyr</code>?在</p>
<p>读入一些数据。您可以将<code>readLines</code>指向文本文件。在</p>
<pre><code>dat <- readLines(textConnection('product/productId: B000GKXY3
product/title: Nun Chuck
product/price: 17.99
review/userId: ADX8VLDUOL7BG
review/profileName: M. Gingras
product/productId: B000GKXY34
product/title: Nun Chuck
product/price: 17.99
review/userId: A3NM6P6BIWTIAE
review/profileName: Maria Carpenter
product/productId: B000GKXY35
product/title: Nun Chuck
product/price: 17.99
review/userId: A3NM6P6BIWTIAF
review/profileName: Someone Else'))
# Identify blocks of data (assuming blank line indicates a new block)
# and split to list L.
L <- split(dat, rep(seq_along(diff(c(0, which(dat==''), length(dat)))),
diff(c(0, which(dat==''), length(dat)))))
# Remove empty elements.
L <- lapply(L, function(x) x[x != ''])
# rbind to a matrix
M <- do.call(rbind, L)
# Extract column names
nm <- sub(':.*$', '', M[1, ])
# Remove column names from matrix elements
M <- gsub('^.*: *', '', M)
# Add column names attribute
colnames(M) <- nm
M
product/productId product/title product/price review/userId review/profileName
1 "B000GKXY3" "Nun Chuck" "17.99" "ADX8VLDUOL7BG" "M. Gingras"
2 "B000GKXY34" "Nun Chuck" "17.99" "A3NM6P6BIWTIAE" "Maria Carpenter"
3 "B000GKXY35" "Nun Chuck" "17.99" "A3NM6P6BIWTIAF" "Someone Else"
</code></pre>
<p>然后你可以很容易地强制使用一个<code>data.frame</code>来使<code>product/price</code>数字,如果这能让你的船漂浮。在</p>