如何比较两个文件的随机数以非顺序排列?

2024-10-02 12:27:13 发布

您现在位置:Python中文网/ 问答频道 /正文

有两个名为compare 1.txt和compare2.txt的文件具有非顺序的随机数

cat compare1.txt

57
11
13
3
889
014
91

cat compare2.txt

003
889
13
14
57
12
90

瞄准

  1. compare1中存在但不在Compare2中的所有数字的输出列表,反之亦然

  2. 如果任何数字的前缀中有零,则在比较时忽略零(基本上,数字的绝对值必须不同,才能视为不匹配) 示例-3应被视为与003匹配,014应被视为与14008匹配,与8等匹配

注-不必在同一行上进行匹配。 比较器1中第一行中的数字应视为匹配,即使比较器2中第一行以外的其他行中存在相同的数字

预期产量

90
91
12
11

PS(我不一定需要预期输出中的精确顺序,只要这4个数字以任何顺序都可以)

我试过什么

显然,我不希望第二个条件是正确的,我试着只满足第一个条件,但没有得到正确的结果。 我试过这些命令

grep -Fxv -f compare1.txt compare2.txt && grep -Fxv -f compare2.txt compare1.txt
cat compare1.txt compare2.txt | sort |uniq

编辑-Python解决方案也可以


Tags: 文件txt示例列表顺序数字条件grep
3条回答

给定这两个文件,在Python中,可以使用对称的集合差:

with open(f1) as f:         # read the first file into a set
    s1={int(e) for e in f}
    
with open(f2) as f:         # read the second file into a set
    s2={int(e) for e in f}
    
print(s2 ^ s1)              # symmetric difference of those two sets
# {11, 12, 90, 91}

可进一步简化为:

with open(f1) as f1, open(f2) as f2:
    print({int(e) for e in f1} ^ {int(e) for e in f2})

有关Python集in the documents的详细信息

请您尝试以下,用GNUawk中显示的样本编写和测试

awk '
{
  $0=$0+0
}
FNR==NR{
  a[$0]
  next
}
($0 in a){
  b[$0]
  next
}
{ print }
END{
  for(j in a){
    if(!(j in b)){ print j }
  }
}
'  compare1.txt compare2.txt

解释:添加上述内容的详细解释

awk '                                ##Starting awk program from here.
{
  $0=$0+0                            ##Adding 0 will remove extra zeros from current line,considering that your file doesn't have float values.
}
FNR==NR{                             ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
  a[$0]                              ##Creating array a with index of current line here.
  next                               ##next will skip all further statements from here.
}
($0 in a){                           ##Checking condition if current line is present in a then do following.
  b[$0]                              ##Creating array b with index of current line.
  next                               ##next will skip all further statements from here.
}
{ print }                                   ##will print current line from 2nd Input_file here.
END{                                 ##Starting END block of this code from here.
  for(j in a){                       ##Traversing through array a here.
    if(!(j in b)){ print j }         ##Checking condition if current index value is NOT present in b then print that index.
  }
}
'  compare1.txt compare2.txt         ##Mentioning Input_file names here.

以下是如何使用awk实现您想要的功能:

$ awk '{$0+=0} NR==FNR{a[$0];next} !($0 in a)' compare1.txt compare2.txt
12
90

$ awk '{$0+=0} NR==FNR{a[$0];next} !($0 in a)' compare2.txt compare1.txt
11
91

但是这是comm存在的工作,你可以用它一次获得所有的差异和共同点。在以下输出中,col1仅为compare1.txt,col2仅为compare2.txt,col3在两个文件之间是通用的:

$ comm <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
    12
        13
        14
        3
        57
        889
    90
91

或分别获得每个结果:

$ comm -23 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
91

$ comm -13 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
12
90

$ comm -12 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
13
14
3
57
889

相关问题 更多 >

    热门问题