<p>有效地使用<code>re.search</code>函数(而不是<code>re.findall</code>)和<code>collections.defaultdict</code>对象:</p>
<pre><code>from collections import defaultdict
import re
data = '''... you data lines '''
ips = []
stats_dict = defaultdict(lambda: {'200': 0, '404': 0, 'GET': 0, 'POST': 0})
for line in data.splitlines():
line = line.strip()
if not line:
continue
m = re.search(r'([\d.]+).*?(GET|POST).*?"\s(200|404)', line)
if not m:
continue
ip_address, method, status_code = m.groups()
stats_dict[ip_address][method] += 1
stats_dict[ip_address][status_code] += 1
if ip_address not in ips:
ips.append(ip_address)
res = [{i: dict({'IP': ip_addr}, **stats_dict[ip_addr])}
for i, ip_addr in enumerate(ips, 1)]
pprint.pprint(res, width=20)
</code></pre>
<p>输出:</p>
<pre><code>[{1: {'200': 2,
'404': 1,
'GET': 2,
'IP': '120.115.144.240',
'POST': 1}},
{2: {'200': 1,
'404': 0,
'GET': 1,
'IP': '202.167.250.99',
'POST': 0}},
{3: {'200': 0,
'404': 1,
'GET': 0,
'IP': '60.4.236.27',
'POST': 1}}]
</code></pre>