慢python解析脚本问题的回答

慢python解析脚本

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我用python编写了一个简单的脚本，它应该逐行扫描一个文件，然后匹配两个不同的正则表达式来重新格式化数据。它的工作原理如下： <pre><code>with open(file) as f: for line in f: line = line.rstrip('\n') parseA(line, anOutPutFile) or parseB(line, anOutPutFile) or parseC(line, anOutPutFile) or parseD(line, anOutPutFile) </code></pre> 每一行可以是A、B、C、D行中的一行，也可以是无行（其中大多数匹配A，第二常见的是B等），下面是parseX函数的示例： ^{pr2}$ 我希望'or'运算符的短路会有所帮助，但是脚本在大文件（例如，大小为~1G的文件）上仍然非常慢，我想知道是否有任何明显和简单的东西可以开始修改，但效率非常低。例如重新编译（但是文档说最近的regexp被缓存了，而我只有一小部分）？在 谢谢 基于以下评论 我先将代码改为使用join，然后改为使用重新编译两人似乎都没有加快这一进程。它运行在一个有50000行的测试文件上，大约需要93秒左右。这也是它之前在这个测试文件中所做的。我在每个正则表达式中有8到12个组，其中有5个。我把代码改成了： <pre><code>regexA = re.compile('.*0' + bla + ' A ' + '.*$(\d+)$ (\d+) (\w+) (\d+)@(.+) .* (.*) .* .* foo=far fox=(.*) test .*') regexB = re.compile(#similar) regexC = re.compile('.*0' + bla + ' C ' + '.*$(\d+)$ (\d+) (\w+) (\d+)@(.+) foo=(\d+) foo2=(\d+) foo3=(\d+)@(.+) (\w+) .* (.*) .* .* foo4=val foo5=(.*) val2 .*') regexD = re.compile(#similar) regexE = re.compile(#similar) #include two of the regex above fully to get an idea of what they look like #now this is an example of one of the parse funcs for regexA def parseA(line,anOutputFile): m = regexA.match(line) if m: out = ''.join(['A',',',m.group(1),',',m.group(2),',',#etc]) anOutputFile.write(out) return True else: return False </code></pre> 也许加入名单不是你的意思？在顶层编译5个regexp也没用。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

慢python解析脚本

1 个回答

相关Python问题