擅长:python、mysql、java
<p>视为带有自定义分隔符的csv文件</p>
<pre><code>>>> import csv
>>> import collections
>>> with open('in.txt') as in_file:
... reader = csv.reader(in_file, delimiter='|')
... data = list(reader) #exhaust generator, convert it to list
... #now you have loaded your data in two-dimensional array, lets find dups
... dup_values = [x for x, y in collections.Counter([r[1] for r in data]).items() if y > 1]
... for r in data:
... if r[1] in dup_values:
... print r
...
['M1.HpyFXIII.dna', 'CCATC']
['M2.HpyFXIII.dna', 'CCATC']
</code></pre>