列操纵器

3条回答

网友

1楼 · 编辑于 2024-09-28 22:20:33

我强烈建议不要考虑将值向左或向右移动—这并没有充分利用R的数据帧对象，在这些对象中，列应该被视为具有完整性。因此，我认为应该根据需要的逻辑在右侧添加新列，然后（如果需要）删除所有原始列，而不是移动和删除单元格。有一种方法可以做到这一点，即创建新的amount、basket和type列，并在末尾丢弃其余的列：

library(dplyr)
data_set <- data_set %>%
  mutate(
    basket_n = case_when(
      # If Type1 is Normal we use its basket:
      Type1 == "Normal" ~ Basket1,
      # If not, then see if Type2 is normal and we can use that (and so on):
      Type2 == "Normal" ~ Basket2
    ),
    amount_n = case_when(
      Type1 == "Normal" ~ Amount1,
      Type2 == "Normal" ~ Amount2
    ),
    type_n = "Normal"
  ) %>%
  select(type_n, basket_n, amount_n)

网友

2楼 · 编辑于 2024-09-28 22:20:33

请看我的更新，在最后的基础上，你添加了新的例子。你知道吗

#I have a data set in the form of a tibble
data_set              <- as.data.frame(matrix(nrow=8))
data_set$column1_set1 <- c(1,1,1,1,1,1,1,1)
data_set$column2_set1 <- c(1,1,1,1,0,1,1,1)
data_set$column3_set1 <- c(1,1,1,1,1,1,1,1)

data_set$column1_set2 <- c(1,1,1,1,1,1,1,1)
data_set$column2_set2 <- c(1,1,1,1,1,1,1,1)
data_set$column3_set2 <- c(1,1,1,1,1,1,1,1)
data_set$V1           <- NULL

data_set <- as.tibble(data_set)

# In each row I have to check columns in sets of 3. 
#   i.e if one of the columns value=0 I have to delete all three columns 
#   and evaluate the next 3 columns.

您可以这样做：

cn               <- colnames(data_set)

for(i in seq(1,length(cn),3)){
  if(any(colSums(data_set[,i:(i+2)]) < nrow(data_set))){
    data_set <- data_set[,!colnames(data_set) %in% cn[i:(i+2)]]

  } else{
    next
  }
}

在新示例中，我们有一些非数字列。我们要做的唯一改变是先检查它们是否是数字。你知道吗

cn               <- colnames(data_set)

for(i in seq(1,length(cn),3)){

    cn_tmp   <- cn[i:(i+2)]
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i]])
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i+1]])
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i+2]])

  if(any(colSums(data_set[,colnames(data_set) %in% cn_tmp]) < nrow(data_set))){
    data_set <- data_set[,!colnames(data_set) %in% cn[i:(i+2)]]

  } else{
    next
  }
}

网友

3楼 · 编辑于 2024-09-28 22:20:33

根据我之前回答的评论中的讨论，这里有另一种方法，它可以避免事先不知道列的数量。在这种情况下，首先将数据转换成一个已知的形状总是值得的，在这种情况下，它只能是一个细长的形状。一旦处于这种状态，就可以按客户id对其进行分组，并以这种方式执行任何需要的操作：

library(tidyverse)

data_set %>%

  # add a unique identifier for the original shape:
  mutate(customer = 1:n()) %>%

  # turn into a long thin format:
  gather(variable, value, -customer) %>%

  # remove any NA values from the unbalanced table (and the empty column at the beginning of the original)
  filter(!is.na(value)) %>%

  # extract the numbers from the original column names:
  mutate(column_set = as.numeric(str_extract(variable, "[0-9]+")),
         variable = gsub("[0-9]+", "", variable)) %>%

  # limit the data to the first "normal" column set for each customer
  group_by(customer) %>%
  mutate(best_column_set = min(column_set[value == "Normal"])) %>%
  filter(column_set == best_column_set) %>%

  # drop the columns we don't need and return to wide format:
  select(-column_set, -best_column_set) %>%
  spread(variable, value) %>%

  # convert from characters back to numbers
  mutate(Basket = as.numeric(Basket),
         Amount = as.numeric(Amount))

这将返回：

# A tibble: 2 x 4
# Groups:   customer [2]
  customer Amount Basket Type  
     <int>  <dbl>  <dbl> <chr> 
1        1      4     45 Normal
2        2      4     98 Normal

这种方法取决于它们在如何标记原始列方面的结构；例如，它们是按列集编号的，并且列名中没有任何其他数字。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

列操纵器

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >