如何在连续两次403后获得IP地址

2024-09-28 19:25:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在制作一个python日志解析器脚本,在这里我需要在连续2个403之后打印Ip

12.115.14.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 403 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 403 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

我的代码在下面

with open(log) as f:
    log = f.read()
    ###if condition to show to get 2 consecutive 403
          iplist = re.findall(rx,log)

我的输出是

120.115.144.240

Tags: loghttpmozillagetchromeaugsafarilike
3条回答

我猜smode上的这个表达式可能会返回这些IP:

403.*?403.*?\s{2,}(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

或具有更多边界:

"\s+\b403\b.*?"\s+\b403\b.*?\s{2,}(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

Demo

测试

import re

regex = r"403.*?403.*?\s{2,}(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

test_str = """

12.115.14.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 403 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 403 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

12.115.14.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 403 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 403 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 200 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"


"""

print(re.findall(regex, test_str, re.DOTALL))

输出

['120.115.144.240', '120.115.144.240']

表达式在regex101.com的右上角面板上解释,如果您希望探索/简化/修改它,在this link中,您可以查看它如何与一些示例输入匹配(如果您愿意)。你知道吗

正则表达式电路

jex.im可视化正则表达式:

enter image description here

似乎403始终显示为第9个字段,由空格分隔,ip号始终位于行的开头。你知道吗

另一种选择是利用这些模式,并使用量词找到正确的部分,防止不必要的回溯。你知道吗

假设行的各个部分之间只有一个空格,则可以匹配到第一个403。然后匹配所有没有403的行,直到找到下一行。你知道吗

在第二个403之后,捕获下一行开头的第一个ip号码。你知道吗

^\S+(?: \S+){7} 403 .*(?:\r?\n(?!\S+(?: \S+){7} 403 ).*)*\r?\n\S+(?: \S+){7} 403 .*(?:\r?\n|\r)+(\d{1,3}(?:\.\d{1,3}){3})

解释

  • ^行首
  • \S+(?: \S+){7} 403 .*匹配第9个字段的403并匹配行的其余部分
  • (?:非捕获组
    • \r?\n(?!\S+(?: \S+){7} 403 ).*匹配403不在第9个字段的整行
  • )*关闭非捕获组,重复0+次
  • \r?\n\S+(?: \S+){7} 403 .*匹配1+个换行符,在第9个字段匹配403,并匹配行的其余部分
  • (?:\r?\n)+匹配换行符的1+倍
  • (\d{1,3}(?:\.\d{1,3}){3})第1组的捕获与ip样模式

Regex demo

考虑到空格或制表符,可以使用this pattern

给你:

result = re.findall('\d+\.\d+\.\d+\.\d+', log)[-2]

输出:

120.115.144.240

相关问题 更多 >