优化python中大文件的正则表达式和文件读取操作

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 19.345 19.345 <string>:1(<module>) 1 7.275 7.275 19.345 19.345 get_candidate2.py:12(foo) 3331494 2.239 0.000 10.772 0.000 re.py:139(search) 3331496 4.314 0.000 5.293 0.000 re.py:226(_compile) 7/2 0.000 0.000 0.000 0.000 sre_compile.py:32(_compile) ...... 3331507 0.632 0.000 0.632 0.000 {method 'get' of 'dict' objects} 3331260 0.560 0.000 0.560 0.000 {method 'group' of '_sre.SRE_Match' objects} 2 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects} 3331494 3.241 0.000 3.241 0.000 {method 'search' of '_sre.SRE_Pattern' objects} 9 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 6662529 0.737 0.000 0.737 0.000 {method 'strip' of 'str' objects}

1条回答

网友

1楼 · 发布于 2024-10-03 15:22:13

多亏了@MikeSatteson和@tobias_k的帮助，我终于明白了。在

要从给定的回复字符串（来自模式文件）中找出与给定回复字符串相对应的所有注释字符串（来自日志文件），解决方案是：

需要一个dict，它的键是reply string，value是一个注释字符串列表。在
从模式文件中取出所有应答字符串，作为dict的键集
从日志文件中选择所有回复注释对，如果dict的密钥集包含回复，则将注释追加到注释列表中。在

代码如下：

my_dict = {}
with open('pattern file', 'r') as pattern_file:
    for line in pattern_file:
        reply = get_reply(line)
        my_dict[reply] = list()     

with open('log file', 'r') as log_file:
    for line in log_file:
        pair = get_comment_reply_pair(line)
        reply = pair.reply
        comment  = pair.comment
        if reply in my_dict:
            l = my_dict[reply]
            l.append(comment)

相关问题更多 >

编程相关推荐

热门问题

热门文章