<p>这里有一个方法。你知道吗</p>
<pre><code>import string
filename = 'testfile'
lines = tuple(open(filename, 'r'))
final_list = []
unique_list = [] # this resets itself every docid
for line in lines:
currentline = str(line)
if 'DOCID' in currentline:
unique_list = [] # this resets itself every docid
final_list.append(line)
else:
exclude = set(string.punctuation)
currentline = ''.join(ch if ch not in exclude else " " for ch in currentline)
city = currentline.split()[1]
if city not in unique_list:
unique_list.append(city)
final_list.append(line)
for line in final_list:
print(line)
</code></pre>
<p>输出:</p>
<pre><code>3210 <DOCID>GH950102-000003<DOCID>/O
3243 Australia/LOCATION
3360 England/LOCATION
3414 India/LOCATION
3474 Melbourne/LOCATION
3526 >Zimbabwe<TOPONYM>/O
3551 >Glasgow<TOPONYM>/O
3568 <DOCID>GH950102-000004<DOCID>/O
3739 Hampden/LOCATION
3838 Ibrox/LOCATION
3861 Neerday/LOCATION
4161 Fir Park/LOCATION
4229 Park<TOPONYM>/O
4244 >Midfield<TOPONYM>/O
4249 >Glasgow<TOPONYM>/O
4251 <DOCID>GH950102-000005<DOCID>/O
4535 Edinburgh/LOCATION
4840 Road<TOPONYM>/O
4850 >Glasgow<TOPONYM>/O``
</code></pre>
<p>注意:<code>testfile</code>是一个包含输入文本的文本文件。如果需要,可以优化代码。你知道吗</p>