基于模式删除文件中的重复行问题的回答

基于模式删除文件中的重复行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图找到一个很好的方法来实现这一点，但不幸的是，我没有找到一个。你知道吗 我正在处理以下格式的文件： <blockquote> =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22491.xml;spectrum=1074 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=2950 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true </blockquote> 正如您所看到的，每个规范行都是不同的，除了最后一行，其中字符串谱的编号是重复的。我想做的是获取模式<code>=Cluster=</code>之间的每个信息块，并检查是否有谱值重复的行。如果有多行重复，则删除除一行之外的所有行。你知道吗 输出文件应如下所示： <blockquote> =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22491.xml;spectrum=1074 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=2950 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true </blockquote> 我用这个来分割文件使用的模式，但我不知道如何检查是否有重复频谱。你知道吗 <pre><code>#!/usr/bin/perl undef $/; $_ = <>; $n = 0; for $match (split(/(?==Cluster=)/)) { open(O, '>temp' . ++$n); print O $match; close(O); } </code></pre> PD：我使用Perl是因为它对我来说更容易，但我也理解python。你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

基于模式删除文件中的重复行

1 个回答

相关Python问题