基于模式删除文件中的重复行

网友

1楼 · 编辑于 2024-09-29 00:22:38

类似这样的操作将删除重复的行（整个文件）。你知道吗

#!/usr/bin/perl

use warnings;
use strict;

my %seen; 

while ( <> ) {
  next if ( m/SPEC/ and $seen{$_}++ );
  print;
}

如果您想更具体地了解光谱值，例如：

next if ( m/spectrum=(\d+)/ and $seen{$1}++ );

在划分集群时，您可以做一些非常类似的事情，但只需：

  if ( $line =~ m/==Cluster==/ ) { 
     open ( $output, ">", "temp".$count++ ); 
     select $output;
  }

这会将默认的“print”位置设置为$output（您也需要在循环外声明它）。你知道吗

您还应该：

use strict;use warnings;
避免将<>读入$_，这是不必要的。但是如果你不得不这样做的话，最好是$block = do { local $/; <> };。然后$block =~ m/regex/
使用词汇文件句柄：open ( my $output, '>', 'filename' ) or die $!;
打开时检查返回代码（or die $!通常就足够了）。你知道吗

所以这就像：

#!/usr/bin/perl

use warnings;
use strict;

my %seen; 
my $count = 0; 
my $output; 

while (  <> ) {
  next if ( m/spectrum=(\d+)/ and $seen{$1}++ );
  if ( m/==Cluster==/ ) { 
     open ( $output, ">", "temp".$count++ ) or die $!; 
     select $output;
  }
  print;
}

网友

2楼 · 编辑于 2024-09-29 00:22:38

如果重复行是连续的，则可以使用以下perl oneliner：

perl -ani.back -e 'next if defined($p) && $_ eq $p;$p=$_;print' file.txt

原始文件是扩展名为.back的备份

网友

3楼 · 编辑于 2024-09-29 00:22:38

您还可以使用这个python脚本，我在其中使用了来自itertools模块的groupby。你知道吗

我假设您的输入文件名为f_input.txt，输出文件名为new_file.txt。你知道吗

from itertools import groupby

data = (k.rstrip().split("=Cluster=") for k in open("f_input.txt", 'r'))
final = list(k for k,_ in groupby(list(data)))

with open("new_file.txt", 'a') as f:
    for k in final:
        if k == ['','']:
            f.write("=Cluster=\n")
        elif k == ['']:
            # write '\n\n' in Windows and '\n' in Linux (tested only in Windows!)
            f.write("\n\n")
        else:
            f.write("{}\n".join(k))

输出文件new_file.txt将与所需的输出类似。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于模式删除文件中的重复行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >