python中的复杂解析

<html> <pre> A Short Study of Notation Efficiency CACM August, 1960 Smith Jr., H. J. CA600802 JB March 20, 1978 9:02 PM 205 4 164 210 4 164 214 4 164 642 4 164 1 5 164 </pre> </html>

3条回答

网友

1楼 · 编辑于 2024-10-05 11:22:11

Quazi，这需要一个regex，特别是启用DOTALL标志的<pre>(.+?)(?:\d+\s+){3}。在

您可以在http://docs.python.org/library/re.html上了解如何在Python中使用regex，如果您做了大量此类字符串提取，您将非常高兴您这样做。逐条查看我提供的regex：

<pre>只与pre标记直接匹配
(.+?)匹配并捕获任何字符
(?:\d+\s+){3}连续三次匹配一些数字，后跟一些空格

网友

2楼 · 编辑于 2024-10-05 11:22:11

下面是一个正则表达式来执行此操作：

findData = re.compile('(?<=<pre>).+?(?=[\d\s]*</pre>)', re.S)

# ...

result = findData.search(data).group(0).strip()

Here's a demo.

网友

3楼 · 编辑于 2024-10-05 11:22:11

我可能会用lxml或beauthoulsoup。在IMO中，regex被过度使用，尤其是在解析HTML时。在

相关问题更多 >

编程相关推荐

热门问题

热门文章