<p>这种方法有点不同,首先查找列表中的作业元素,然后一次处理每个“块”</p>
<p>每个作业块的信息在<code>row</code>列表中编译,然后在作业块结束时将其附加到<code>rows</code>列表中</p>
<pre><code>import re
l = ['Job Id: 49361.tyrone-cluster', 'resources_used.cput = 14:32:14', 'resources_used.mem = 13955852kb', 'resources_used.vmem = 14199016kb', 'resources_used.walltime = 05:23:02', 'job_state = R', 'queue = qp32', 'Job Id: 49362.tyrone-cluster', 'job_state = Q', 'queue = batch', 'comment = Not Running: Queue not an execution queue.', 'Job Id: 49395.tyrone-cluster', 'resources_used.cput = 31:20:32', 'resources_used.mem = 19179712kb', 'resources_used.vmem = 158305072kb', 'resources_used.walltime = 01:57:34', 'job_state = R', 'queue = idqueue', 'Job Id: 49396.tyrone-cluster', 'resources_used.cput = 46:26:45', 'resources_used.mem = 5347092kb', 'resources_used.vmem = 7588024kb', 'resources_used.walltime = 01:44:50', 'job_state = R', 'queue = qp32', 'Job Id: 49408.tyrone-cluster', 'job_state = Q', 'queue = qp32']
job_elements = [i for (i,e) in enumerate(l) if re.match(r'Job Id: (\d+)', e)] + [len(l) + 1]
rows = []
for (s,e) in zip(job_elements[:-1], job_elements[1:]):
row = []
for line in l[s:e]:
mat = re.match(r'Job Id: (\d+)', line)
if mat:
row.append(mat.group(1).strip())
continue
mat = re.match(r'.* = (.*)', line)
if mat:
row.append(mat.group(1).strip())
continue
rows.append(' '.join(row))
# Print output :
for r in rows:
print r
# Or write to file:
with open('output.txt', 'w') as f:
for r in rows:
f.write(r) # You could write these two lines as f.write(r + '\n')
f.write('\n') # if you didn't care about creating a string unnecessarily
</code></pre>
<p>产出:</p>
<pre>
49361 14:32:14 13955852kb 14199016kb 05:23:02 R qp32
49362 Q batch Not Running: Queue not an execution queue.
49395 31:20:32 19179712kb 158305072kb 01:57:34 R idqueue
49396 46:26:45 5347092kb 7588024kb 01:44:50 R qp32
49408 Q qp32
</pre>
<p>作为参考,<code>(s,e) in zip(job_elements[:-1], job_elements[1:])</code>生成以下元组,它们是原始列表中“作业Id”项的起始(包含)和结束(排除)索引:</p>
<pre>
( 0, 7)
( 7, 11)
(11, 18)
(18, 25)
(25, 29)
</pre>