两位数和连字符的正则表达式

网友

1楼 · 编辑于 2024-10-17 06:16:28

这里，试试这个：

([0-9]{2}-[a-zA-Z]{5,}[0-9]{5,}\.txt){1,}

这将匹配（紧密但松散）文件名的格式。你可以适应你的需要。

对此进行拆分，然后相应地分离文件。

网友

2楼 · 编辑于 2024-10-17 06:16:28

如果您的文件足够小，可以同时将其读入内存，那么您只需在lookahead regex上对其进行拆分

re.split('(?=\d\d-)', contents)

或者在它们所属的地方插入新行

re.sub('(?=\d\d-)', "\n", contents)

网友

3楼 · 编辑于 2024-10-17 06:16:28

以下是我总结出来的，利用它来适应：

import re

m = """01-someText151645.txt,Wed Feb 1 16:15:18 2012,1328112918.57801-HalfMeg151646.txt,Wed Feb 1 16:15:18 2012,1328112918.578"""

print(m)

addNewLineBefore = lambda matchObject: "\n" + matchObject.group(0)

print ( re.sub(r'\d{2}-',addNewLineBefore,m) )

当然，它假设\d{2}-匹配对于行首是唯一的。如果它们可能出现在行中，例如文件名中，请提及它，我将编辑此答案以适应

编辑：如果不想将整个文件读入内存，可以使用缓冲区：

import re
input = open("infile","r")
output = open("outfile","w")

oneLine = re.compile(r"""(
        \d{2}-  # the beginning of the line
        .+?     # the middle of the line
        \.\d{3} # the dot and three digits at the end
)""", re.X)

while buffer:
    buffer = input.read(6000) # adjust this to suit
    #newbuffer = re.split(r'(\d{2}-.+?\.\d{3})',buffer) # I'll use the commented re object above
    newbuffer = oneLine.split(buffer)
    newbuffer = filter(None,newbuffer)
    output.write( "\n".join(newbuffer) )
input.close()
output.close()

如果错误检查和效率是必需的，则不应使用此选项。据我所知，这是一个非常受控制的非正式环境

相关问题更多 >

编程相关推荐

热门问题

热门文章

两位数和连字符的正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >