Python regex匹配Apache LogFormat“combinedvhost”

2024-09-29 17:13:40 发布

您现在位置:Python中文网/ 问答频道 /正文

LogFormat "%v %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedvhost
CustomLog "/var/log/apache2/access_log" combinedvhost    

我有一个apache配置,它生成了一个具有上述日志格式的访问日志。我正在尝试创建一个python(2.7.13)regex来创建组(忽略HTTP方法和HTTP版本)。你知道吗

以下是迄今为止我的正则表达式:

(?P<host>.*)\s+(?P<ip>\S+)\s+-\s+-\s+\[(?P<date>\S+)\s+(?P<timezone>.*)\]\s+"\S+\s+(?P<path>\S+)(?:\?(?P<querystring>\S+))?\s+\S+"\s+(?P<status>\S+)\s+(?P<length>\S+)\s+"(?P<referrer>.*)"\s+"(?P<user_agent>.*)"\s+

我的问题是第一个日志行,预期结果是path = /querystring = simplode_ajax=true&simplode_query%5Border%5D=DESC。就像我的路径组匹配贪婪一样,因为它返回querystring = None,而整个字符串返回path。。。你知道吗

我在http://pythex.org测试regex之上和log之下。你知道吗

default 1.2.3.4 - - [05/Jan/2017:10:56:18 -0800] "GET /?simplode_ajax=true&simplode_query%5Border%5D=DESC HTTP/1.1" 200 - "http://www.xxx.xx/xxx/xx/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
default 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "GET /xxx/xx/06/22/xxxxx/ HTTP/1.1" 200 11098 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "POST /xxxxxx.php HTTP/1.1" 200 370 "-" "-"
default 1.2.3.4 - - [05/Jan/2017:10:56:23 -0800] "GET /blog/xxx/01/22/xxxxx/ HTTP/1.1" 200 14404 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:24 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 21901 "https://www.codingmerc.com/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:25 -0800] "POST /xxxxx.php HTTP/1.1" 200 370 "-" "-"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:29 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 13831 "https://www.xxx.xx/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"

Tags: comloghttpmozillagetwwwbotblog

热门问题