解析文本日志并计算特定事件/错误的出现次数

2024-09-28 19:19:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个Python脚本来解析日志,并计算日志中每个IP地址发生GETPOST200404的次数。你知道吗

日志文件示例:

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 200 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

60.4.236.27 - - [29/Aug/2017:04:42:46 -0400] "POST /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 404 115656 "http://bbs.mydigit.cn/read.php?tid=1952896" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "POST /apng/assembler-2.0/assembler2.php HTTP/1.1" 404 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

预期产量:

120.115.144.240: 200=2,404=1,GET=2,POST=2
202.167.250.99: 200=1,404=0,GET=1,POST=0
60.4.236.27: 200=0,404=1,GET=0,POST=1

我可以从文件中创建IP列表;如何计算每个IP地址的每个状态出现的次数?你知道吗

from collections import Counter
def countip(log):
    rx = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
    with open(log) as f:
        log = f.read()
        iplist = re.findall(rx,log)
        #print (iplist)
        ipcount = Counter(iplist)
        for k, v in ipcount.items():
            print (k,v)
countip(r"C:\Users\user\Desktop\Tests\apache_log.log")

Tags: loghttpmozillagetchromepostaugsafari
1条回答
网友
1楼 · 发布于 2024-09-28 19:19:28

此脚本将只分析GETPOST消息和状态代码200404

data = '''

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 200 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

60.4.236.27 - - [29/Aug/2017:04:42:46 -0400] "POST /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 404 115656 "http://bbs.mydigit.cn/read.php?tid=1952896" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "POST /apng/assembler-2.0/assembler2.php HTTP/1.1" 404 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"'''

import re

data = [line.strip() for line in data.splitlines() if line.strip()]

out = {}
for line in data:
    g = re.findall(r'([\d.]+).*?(GET|POST).*?"\s(200|404)', line)
    if not g:
        continue
    ip_address, method, status_code = g[0]
    out.setdefault(ip_address, {})
    out[ip_address].setdefault('404', 0)
    out[ip_address].setdefault('200', 0)
    out[ip_address].setdefault('GET', 0)
    out[ip_address].setdefault('POST', 0)
    out[ip_address][method] += 1
    out[ip_address][status_code] += 1

from pprint import pprint
pprint(out, width=30)

印刷品:

{'120.115.144.240': {'200': 2,
                     '404': 1,
                     'GET': 2,
                     'POST': 1},
 '202.167.250.99': {'200': 1,
                    '404': 0,
                    'GET': 1,
                    'POST': 0},
 '60.4.236.27': {'200': 0,
                 '404': 1,
                 'GET': 0,
                 'POST': 1}}

相关问题 更多 >