在Python2.7中保留访问日志文件

2024-09-27 09:34:46 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我有日志文件

88.191.254.20 - - [22/Mar/2009:07:00:32 +0100] "GET / HTTP/1.0"
66.249.66.231 - - [22/Mar/2009:07:06:20 +0100] "GET /popup.php?choix=-89 HTTP/1.1"
66.249.66.231 - - [22/Mar/2009:07:11:20 +0100] "GET /specialiste.php HTTP/1.1"
83.198.250.175 - - [22/Mar/2009:07:40:06 +0100] "GET / HTTP/1.1"
83.198.250.175 - - [22/Mar/2009:07:40:06 +0100] "GET /style.css HTTP/1.1"
83.198.250.175 - - [22/Mar/2009:07:40:06 +0100] "GET /images/ht1.gif HTTP/1.1"
.....

我想要这样的结果
结果

"88.191.254.20", 1 times,
"22/Mar/2009", "07:00:32", "+0100", "GET / HTTP/1.0"

"66.249.66.231", 2 times,
"22/Mar/2009", "07:06:20", "+0100", "GET /popup.php?choix=-89 HTTP/1.1"
"22/Mar/2009", "07:11:20", "+0100", "GET /specialiste.php HTTP/1.1"

"83.198.250.175", 3 times,
"22/Mar/2009", "07:40:06", "+0100", "GET / HTTP/1.1"
"22/Mar/2009", "07:40:06", "+0100", "GET /style.css HTTP/1.1"
"22/Mar/2009", "07:40:06", "+0100", "GET /images/ht1.gif HTTP/1.1


并将结果保存在csv文件中


Tags: 文件csvhttpgetstylegifcssmar
1条回答
网友
1楼 · 发布于 2024-09-27 09:34:46

这里有一个方法:

import re

aggregate = {}

conf = '$ip - $user [$date:$time $milis] "$request"'
regex = ''.join(
    '(?P<' + g + '>.*?)' if g else re.escape(c)
    for g, c in re.findall(r'\$(\w+)|(.)', conf))


with open('example.log', 'r') as f:
    for line in f:
        m = re.match(regex, line.strip())
        d = m.groupdict()
        if not aggregate.get(d['ip']):
            aggregate[d['ip']] = []
        aggregate[d['ip']].append((d['date'], d['time'], d['milis'], d['request']))

with open('out.log', 'w') as out:
    for key in aggregate:
        out.write('"{0}", {1} times,\n'.format(key, len(aggregate[key])))
        for item in aggregate[key]:
            out.write('"{0}","{1}","{2}","{3}"\n'.format(*item))
        out.write('\n')

相关问题 更多 >

    热门问题