<p>根据我之前回答的评论中的讨论,这里有另一种方法,它可以避免事先不知道列的数量。在这种情况下,首先将数据转换成一个已知的形状总是值得的,在这种情况下,它只能是一个细长的形状。一旦处于这种状态,就可以按客户id对其进行分组,并以这种方式执行任何需要的操作:</p>
<pre><code>library(tidyverse)
data_set %>%
# add a unique identifier for the original shape:
mutate(customer = 1:n()) %>%
# turn into a long thin format:
gather(variable, value, -customer) %>%
# remove any NA values from the unbalanced table (and the empty column at the beginning of the original)
filter(!is.na(value)) %>%
# extract the numbers from the original column names:
mutate(column_set = as.numeric(str_extract(variable, "[0-9]+")),
variable = gsub("[0-9]+", "", variable)) %>%
# limit the data to the first "normal" column set for each customer
group_by(customer) %>%
mutate(best_column_set = min(column_set[value == "Normal"])) %>%
filter(column_set == best_column_set) %>%
# drop the columns we don't need and return to wide format:
select(-column_set, -best_column_set) %>%
spread(variable, value) %>%
# convert from characters back to numbers
mutate(Basket = as.numeric(Basket),
Amount = as.numeric(Amount))
</code></pre>
<p>这将返回:</p>
<pre><code># A tibble: 2 x 4
# Groups: customer [2]
customer Amount Basket Type
<int> <dbl> <dbl> <chr>
1 1 4 45 Normal
2 2 4 98 Normal
</code></pre>
<p>这种方法取决于它们在如何标记原始列方面的结构;例如,它们是按列集编号的,并且列名中没有任何其他数字。你知道吗</p>