<p>下面是一个可能的实现<code>parse_file</code>包含以下变量:</p>
<ul>
<li><p><code>this_info</code>:包含与当前行相关信息的字典</p>
</li>
<li><p><code>previous_info</code>:<code>this_info</code>来自上一次迭代</p>
</li>
<li><p><code>start_info</code>:<code>this_info</code>来自新操纵子ID开头的最近一行</p>
</li>
</ul>
<p>所需的输出并不完全清楚,但调整主程序(在末尾)以以您选择的任何形式写入提取的字段</p>
<pre><code>def parse_file(input_file):
"""
reads an opr file, returns a list of dictionaries with info about the operon ids
"""
results = []
start_info = previous_info = {}
with open(input_file) as f:
next(f) # ignore first line
for line in f:
bits = line.split()
# dictionary containing information extracted from a
# particular line
this_info = {'operon_id': int(bits[0]),
'start': int(bits[3]),
'end': int(bits[4]),
'strand': bits[5]}
if not previous_info:
# first line of file
start_info = this_info
elif previous_info['operon_id'] != this_info['operon_id']:
# this is the first line with NEW Operon ID,
# so add result for previous Operon ID,
# of which the end line was the PREVIOUS line
_add_result(results, start_info, previous_info)
start_info = this_info # start line for this ID
# also adding a sanity check here - the strand
# should be the same for every line of a given
# operon ID
if start_info["strand"] != this_info["strand"]:
print("warning, strand info inconsistent")
previous_info = this_info # ready for next iteration
_add_result(results, start_info, this_info) # last ID
return results
def _add_result(results, start_info, end_info):
"""
add to the results a dictionary based on start line info
but with end line info used for the 'end' field
"""
info = start_info.copy()
info['end'] = end_info['end']
results.append(info)
for result in parse_file('operonmap.opr'):
# write out some info
print(result['operon_id'],
result['start'],
result['end'],
result['strand'])
</code></pre>
<p>这使得:</p>
<pre><code>1132034 2052 4997 +
1132035 5123 9818 +
1132036 11421 11692 -
1132037 14089 14877 +
</code></pre>