擅长:python、mysql、java
<p>似乎你的正则表达式和你提供的两个例子都很适合我(或者我只是搞错了问题)。我用下面的脚本测试了它(很抱歉排长队):</p>
<pre><code>#!/usr/bin/env python
import re
lines = ['68.134.160.117 - - [09/Mar/2004:22:24:27 -0500] "GET http://www.glocksoft.net/cgi-bin/jenv.cgi HTTP/1.0" 200 1169 "-" "Mozilla/4.0"',
'220.175.18.42 - - [09/Mar/2004:22:47:30 -0500] "GET http://www.searchlikecrazy.com/cgi-bin/smartsearch.cgi?keywords=Web+Design%20&username=arongyi HTTP/1.0" \
200 26166 "http://www.yourwindow.com/searchlikecrazy.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; MyIE2)"']
regex = re.compile(r'.*\[(\d*)/(\w*)/(\d*).*"(GET|POST)\s(https?://)[a-z].*?\.([a-z]+)[^\w.-].*200')
for line in lines:
match = regex.match(line)
if match:
print match.groups()
</code></pre>
<p>输出:</p>
<pre><code>('09', 'Mar', '2004', 'GET', 'http://', 'net')
('09', 'Mar', '2004', 'GET', 'http://', 'com')
</code></pre>
<p>Python版本:2.7.1</p>