有没有办法根据模式删除字符串中的重复字符串？问题的回答

有没有办法根据模式删除字符串中的重复字符串？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在处理以下格式的文件： <pre><code>=Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22491.xml;spectrum=1074 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=2950 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true </code></pre> 正如您所看到的，每个规范行都是不同的，除了两行中的字符串谱数是重复的。我想做的是获取模式<code>=Cluster=</code>之间的每个信息块，并检查是否有谱值重复的行。如果有多行重复，则删除除一行之外的所有行。你知道吗 输出文件应如下所示： <pre><code>=Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22491.xml;spectrum=1074 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=2950 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true =Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=473 true </code></pre> 我使用的是itertools模块中的<code>groupby</code>。我假设我的输入文件名为f_输入.txt输出文件称为new_文件.txt，但是这个脚本也删除了SPEC这个词。。。我不知道我能改变什么才能不做这些。你知道吗 <pre><code>from itertools import groupby data = (k.rstrip().split("=Cluster=") for k in open("f_input.txt", 'r')) final = list(k for k,_ in groupby(list(data))) with open("new_file.txt", 'a') as f: for k in final: if k == ['','']: f.write("=Cluster=\n") elif k == ['']: f.write("\n\n") else: f.write("{}\n".join(k)) </code></pre> 编辑：新建条件。有时部分行号可能会更改，例如： <pre><code>=Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true SPEC PRD000682;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true </code></pre> 如您所见，最后一行更改了零件PRDnumber。一种解决方法是检查谱号，去除重复谱中的谱线。你知道吗 这将是一个解决方案： <pre><code>=Cluster= SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=1876 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3479 true SPEC PRD000681;PRIDE_Exp_Complete_Ac_22498.xml;spectrum=3785 true </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

有没有办法根据模式删除字符串中的重复字符串？

1 个回答

相关Python问题