提取主机名、时间戳、HTTP请求方法、URI和协议

2024-07-06 04:31:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从下面的响应中提取主机名、时间戳、HTTP请求方法、URI和协议

unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085

使用正则表达式。请让我知道我该怎么做


Tags: 方法http协议getnet时间urijul
1条回答
网友
1楼 · 发布于 2024-07-06 04:31:44

我尝试了下面的代码

timestamp - r"\[\d+/\D+/.*\]
host name - (\d+\.\d+\.\d+\.\d+)\s* |(.+)\.(com|info|biz|tv|net)
status code - "\s\d{3}

但是没有得到预期的结果。它表示期望字符串或字节大小

regex = r'\b(\d+\.\d+\.\d+\.\d+)\s* |(.+)\.(com|info|biz|tv|net)'

sample_text = ("[unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985, 199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085]")

matches = re.findall(regex, sample_text)
hosts = []
for matchNum, match in enumerate(matches, start=1):
    hosts.append(match.group()[1:27])
print(hosts)

相关问题 更多 >