擅长:python、mysql、java
<p>以下是我总结出来的,利用它来适应:</p>
<pre><code>import re
m = """01-someText151645.txt,Wed Feb 1 16:15:18 2012,1328112918.57801-HalfMeg151646.txt,Wed Feb 1 16:15:18 2012,1328112918.578"""
print(m)
addNewLineBefore = lambda matchObject: "\n" + matchObject.group(0)
print ( re.sub(r'\d{2}-',addNewLineBefore,m) )
</code></pre>
<p>当然,它假设<code>\d{2}-</code>匹配对于行首是唯一的。如果它们可能出现在行中,例如文件名中,请提及它,我将编辑此答案以适应</p>
<p><strong>编辑:</strong>如果不想将整个文件读入内存,可以使用缓冲区:</p>
<pre><code>import re
input = open("infile","r")
output = open("outfile","w")
oneLine = re.compile(r"""(
\d{2}- # the beginning of the line
.+? # the middle of the line
\.\d{3} # the dot and three digits at the end
)""", re.X)
while buffer:
buffer = input.read(6000) # adjust this to suit
#newbuffer = re.split(r'(\d{2}-.+?\.\d{3})',buffer) # I'll use the commented re object above
newbuffer = oneLine.split(buffer)
newbuffer = filter(None,newbuffer)
output.write( "\n".join(newbuffer) )
input.close()
output.close()
</code></pre>
<p>如果错误检查和效率是必需的,则不应使用此选项。据我所知,这是一个非常受控制的非正式环境</p>