在列中添加逗号分隔的值

2024-09-25 00:23:57 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我有一个像这样的文件格式(TSV)

Name  type    Age     Weight       Height 
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5 
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5

我想将年龄、体重和身高中的所有值相加,并在其后添加一列,然后再添加一些百分比,比如总身高/总体重(awk'$0=$0“\t”(NR==1?“百分比”:$8/$7)”。我有大量的数据集,无法使用excel

像这样

Name  type    Age     Weight       Height     Total_Age Total_Weight Total_Height Percentage
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5   69        210         20.5          0.097            
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5   67        150         17.5          0.11 

Tags: nameagetsvtypenrxxxtotal百分比
3条回答

在每个Unix设备上的任何shell中使用任何awk,并且没有在每个记录中创建新字段(这是低效的,因为它会导致awk在每次更改字段时重新构建记录),也没有更新输入记录(这是低效的,因为它会导致awk在每次更改记录时将记录重新拆分为字段)并设计用于以任意顺序处理任意数量的值输入列:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", $0, OFS }
NR==1 {
    for (i=3; i<=NF; i++) {
        printf "Total_%s%s", $i, OFS
        tags[i] = $i
    }
    print "Percentage"
    next
}
{
    delete tot
    for (i=3; i<=NF; i++) {
        tag = tags[i]
        n = split($i,vals,",")
        for (j in vals) {
            tot[tag] += vals[j]
        }
        printf "%s%s", tot[tag], OFS
    }
    printf "%0.3f%s", (tot["Weight"] ? tot["Height"] / tot["Weight"] : 0), ORS
}

$ awk -f tst.awk file
Name    type    Age     Weight  Height  Total_Age       Total_Weight    Total_Height    Percentage
Xxx     M       12,34,23        50,30,60,70     4,5,6,5.5       69      210     20.5    0.098
Yxx     F       21,14,32        40,50,20,40     3,4,5,5.5       67      150     17.5    0.117

$ awk -f tst.awk file | column -t
Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.098
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.117

为了展示上述方法的功能优势,假设您需要添加更多的值,如ShoeSize和/或重新排列列的顺序,例如:

$ column -t file
Name  type  ShoeSize  Height     Age       Weight
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40

现在运行上述脚本,注意为每个原始列添加了Total_列,但仍然在末尾添加了相同的Percentage列的高度/重量:

$ awk -f tst.awk file | column -t
Name  type  ShoeSize  Height     Age       Weight       Total_ShoeSize  Total_Height  Total_Age  Total_Weight  Percentage
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70  30              20.5          69         210           0.098
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40  24              17.5          67         150           0.117

对于所显示的示例,请尝试以下代码

awk '
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge
  $(NF+1)=totWeight
  $(NF+1)=totHeight
  $(NF+1)=$(NF-1)==0?"N/A":$NF/$(NF-1)
}
1' Input_file | column -t

添加上述awk代码的略短版本:

awk '
BEGIN{OFS="\t"}
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge OFS totWeight OFS totHeight
  $0=$0
  $(NF+1)=( $(NF-1)==0 ? "N/A" : $NF/$(NF-1) )
}
1' Input_file | column -t

解释:简单的解释是,取第三、第四和第五列的总和,并将它们分配到行的最后一列。根据OP的要求,相应地添加列值,该列值具有最后一列和最后第二列的除法值。使用column -t使其在输出上看起来更好

如果必须多次执行相同的操作,还可以使用函数对数组值求和(假设值是以逗号分隔的数字)

重用来自RavinderSingh13的部分答案,并向Ed Morton表示衷心感谢,花时间提供大量反馈以改进代码:

awk '
function arraySum(field,      sum,arr,i) {
  split(field,arr,",")
  for (i in arr) sum += arr[i]
  return sum
}
FNR==1{
  print $0, "Total_Age", "Total_Weight", "Total_Height", "Percentage"
  next
}
NR > 1 {
  sumWeight = arraySum($4)
  sumHeight = arraySum($5)
  print $0, arraySum($3), sumWeight, sumHeight, (sumWeight ? sumHeight/sumWeight : 0)
}' file | column -t

输出

Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.097619
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.116667

相关问题 更多 >