<p>尝试以下操作:</p>
<pre><code>from collections import defaultdict
# Keep a dictionary of our rd and pc values, with the value as a list of the line numbers each occurs on
# e.g. {'10': [1, 45, 79]}
pc_elements = defaultdict(list)
rd_elements = defaultdict(list)
with open(file, 'rb') as f:
line_number = 0
csvin = csv.reader(f, delimiter='\t')
for row in csvin:
try:
pc_elements[int(row[0])].append(line_number)
rd_elements[int(row[1])].append(line_number)
line_number += 1
except ValueError:
print("Not a number")
print(row)
line_number += 1
continue
for pc, indexes in pc_elements.iteritems():
print("pc {0} appears {1} times. First on row {2}, last on row {3}".format(
pc,
len(indexes),
indexes[0],
indexes[-1]
))
</code></pre>
<p>这是通过在读取<code>TSV</code>时创建一个字典,以<code>pc</code>值为键,以出现列表为值。根据dict的性质,键必须是唯一的,因此我们避免使用<code>set</code>,而{<cd4>}值只用于保存键所在的行。在</p>
<p>示例:</p>
^{pr2}$
<p>将输出:</p>
<pre><code>"pc 10 appears 4 times. First on row 4, last on row 101"
"pc 8 appears 3 times. First on row 3, last on row 13"
</code></pre>