列操纵器

2024-09-28 22:20:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我对R的工作还很陌生,我希望有人能给我指出正确的方向。我有一个tible格式的数据集,我需要遍历每一行。 在每一行中,我必须检查3个列的集合。 i、 e如果其中一列的值为0,我必须删除所有三列并计算下三列。你知道吗

data_set <- as.data.frame(matrix(nrow=2))
data_set$Basket1<- c(45,35)
data_set$Type1 <- c("Normal","Premium")
data_set$Amount1 <- c(4,5)

data_set$Basket2 <- c(4,98)
data_set$Type2 <- c("Normal","Normal")
data_set$Amount2 <- c(0,4)

#when Type is "Premium" I want to remove the values for 
#Basket1,Type1,Amount1 
#and shift the next 3 cells to the left

Tags: theto数据dataas格式方向normal
3条回答

我强烈建议不要考虑将值向左或向右移动—这并没有充分利用R的数据帧对象,在这些对象中,列应该被视为具有完整性。因此,我认为应该根据需要的逻辑在右侧添加新列,然后(如果需要)删除所有原始列,而不是移动和删除单元格。有一种方法可以做到这一点,即创建新的amount、basket和type列,并在末尾丢弃其余的列:

library(dplyr)
data_set <- data_set %>%
  mutate(
    basket_n = case_when(
      # If Type1 is Normal we use its basket:
      Type1 == "Normal" ~ Basket1,
      # If not, then see if Type2 is normal and we can use that (and so on):
      Type2 == "Normal" ~ Basket2
    ),
    amount_n = case_when(
      Type1 == "Normal" ~ Amount1,
      Type2 == "Normal" ~ Amount2
    ),
    type_n = "Normal"
  ) %>%
  select(type_n, basket_n, amount_n)

请看我的更新,在最后的基础上,你添加了新的例子。你知道吗

#I have a data set in the form of a tibble
data_set              <- as.data.frame(matrix(nrow=8))
data_set$column1_set1 <- c(1,1,1,1,1,1,1,1)
data_set$column2_set1 <- c(1,1,1,1,0,1,1,1)
data_set$column3_set1 <- c(1,1,1,1,1,1,1,1)

data_set$column1_set2 <- c(1,1,1,1,1,1,1,1)
data_set$column2_set2 <- c(1,1,1,1,1,1,1,1)
data_set$column3_set2 <- c(1,1,1,1,1,1,1,1)
data_set$V1           <- NULL

data_set <- as.tibble(data_set)

# In each row I have to check columns in sets of 3. 
#   i.e if one of the columns value=0 I have to delete all three columns 
#   and evaluate the next 3 columns. 

您可以这样做:

cn               <- colnames(data_set)

for(i in seq(1,length(cn),3)){
  if(any(colSums(data_set[,i:(i+2)]) < nrow(data_set))){
    data_set <- data_set[,!colnames(data_set) %in% cn[i:(i+2)]]

  } else{
    next
  }
}

在新示例中,我们有一些非数字列。我们要做的唯一改变是先检查它们是否是数字。你知道吗

cn               <- colnames(data_set)

for(i in seq(1,length(cn),3)){

    cn_tmp   <- cn[i:(i+2)]
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i]])
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i+1]])
    cn_tmp   <- ifelse(class(data_set[,colnames(data_set) %in% cn_tmp])=="numeric",
                       cn_tmp, cn_tmp[!cn_tmp==cn_tmp[i+2]])

  if(any(colSums(data_set[,colnames(data_set) %in% cn_tmp]) < nrow(data_set))){
    data_set <- data_set[,!colnames(data_set) %in% cn[i:(i+2)]]

  } else{
    next
  }
}

根据我之前回答的评论中的讨论,这里有另一种方法,它可以避免事先不知道列的数量。在这种情况下,首先将数据转换成一个已知的形状总是值得的,在这种情况下,它只能是一个细长的形状。一旦处于这种状态,就可以按客户id对其进行分组,并以这种方式执行任何需要的操作:

library(tidyverse)

data_set %>%

  # add a unique identifier for the original shape:
  mutate(customer = 1:n()) %>%

  # turn into a long thin format:
  gather(variable, value, -customer) %>%

  # remove any NA values from the unbalanced table (and the empty column at the beginning of the original)
  filter(!is.na(value)) %>%

  # extract the numbers from the original column names:
  mutate(column_set = as.numeric(str_extract(variable, "[0-9]+")),
         variable = gsub("[0-9]+", "", variable)) %>%

  # limit the data to the first "normal" column set for each customer
  group_by(customer) %>%
  mutate(best_column_set = min(column_set[value == "Normal"])) %>%
  filter(column_set == best_column_set) %>%

  # drop the columns we don't need and return to wide format:
  select(-column_set, -best_column_set) %>%
  spread(variable, value) %>%

  # convert from characters back to numbers
  mutate(Basket = as.numeric(Basket),
         Amount = as.numeric(Amount))

这将返回:

# A tibble: 2 x 4
# Groups:   customer [2]
  customer Amount Basket Type  
     <int>  <dbl>  <dbl> <chr> 
1        1      4     45 Normal
2        2      4     98 Normal

这种方法取决于它们在如何标记原始列方面的结构;例如,它们是按列集编号的,并且列名中没有任何其他数字。你知道吗

相关问题 更多 >