Python正则表达式缩短

2024-10-02 12:28:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我在用正则表达式分析日志文件。 日志示例:

<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pass">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]</teststep>

我需要提取时间戳信号响应时间。你知道吗

我对这个问题的解决办法是:

with open('report.xml') as f:
for line in f:
    if 'Signal response time: ' in line:
        timeStampL = re.findall('timestamp="\d*.\d*"', line)
        responseTimeL = re.findall('Signal response time: \d*.\d*',
                                   line, re.IGNORECASE)
        timeStamp = float(re.findall('\d+.\d+', timeStampL[0])[0])
        responseTime = float(re.findall('\d+.\d+', responseTimeL[0])[0])

我确信这不是获得这些数据的最短和最好的方法。 你能给我建议一个更好的方法吗?你知道吗


Tags: inresignaltimeresponseline时间float
2条回答

获得所需结果的另一种方法是使用类似^{}的XML/HTML解析器来定位元素,获取timestamp属性(在BeautifulSoup中,在读取属性时可以将元素视为字典),并用正则表达式提取“信号响应时间”:

In [1]: import re

In [2]: from bs4  import BeautifulSoup

In [3]: data = """<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pas
    ...: s">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal
    ...:  response time limit set: 100.000000 [ms]</teststep>"""

In [4]: soup = BeautifulSoup(data, "html.parser")

In [5]: pattern = re.compile(r"Signal response time: ([0-9.]+)")

In [6]: elm = soup.find("teststep", text=pattern)

In [7]: print(elm["timestamp"], pattern.search(elm.get_text()).group(1))
12040.310594 0.000000

我们可以使用BeautifulSoup提取属性值和元素文本,如下所示:

由于文本值Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]带有\n分隔符,因此您可以使用它们split()您的数据并仅获取0.000000 [ms]。你知道吗

代码:

from bs4 import BeautifulSoup

html_code = '<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pass">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]</teststep>'

soup = BeautifulSoup(html_code, "html.parser")

for test in soup.find_all('teststep'):
    print(test.get('timestamp'))
    print(test.text.split("\n")[1].split(":")[1].strip())

输出:

12040.310594
0.000000 [ms]

注意:您可以通过更改以下内容来删除[ms]处的0.000000 [ms]

test.text.split("\n")[1].split(":")[1].strip()

对此:

test.text.split("\n")[1].split(":")[1].strip().replace(" [ms]", "")

相关问题 更多 >

    热门问题