在用lxml解析Python时如何防止无限循环？

<html> <head></head> <body> <dfn>A</dfn>sometext / '' (othertext)someothertext / '' (...) (...) <dfn>B</dfn>sometext / '' (othertext)someothertext / '' blabla bubu </body> </html>

> A, sometext, othertext, someothertext. > > B, sometext, othertext, someothertext. > > C, sometext, othertext, someothertext. > > ... > > Z, sometext, othertext, someothertext. > (2nd unnecessary loop): > > B, sometext, othertext, someothertext. > > C, sometext, othertext, someothertext. > > D, sometext, othertext, someothertext. > > ... > > Z, sometext, othertext, someothertext. > (3rd unnecessary loop): > > C, sometext, othertext, someothertext. > > D, sometext, othertext, someothertext. > > E, sometext, othertext, someothertext. > > ... > > Z, sometext, othertext, someothertext...etc

1条回答

网友

1楼 · 发布于 2024-06-28 20:41:13

要获取p下面的所有文本，只需执行以下操作：

tree.xpath("//p//text()")

如果需要按p聚合它们，请执行以下操作：

[[y.strip() for y in x.xpath('.//text()') if y.strip()] for x in tree.xpath('//p')]

基于i文本提取p文本：

>>> [y.strip() for y in x.xpath('//i[.="blabla"]/..//text()') if y.strip()]
['B', 'sometext', 'othertext', 'someothertext', 'blabla', 'bubu']

或通过dfn文本：

>>> [y.strip() for y in x.xpath('//dfn[.="B"]/..//text()') if y.strip()]
[['B', 'sometext', 'othertext', 'someothertext', 'blabla', 'bubu']]

相关问题更多 >

编程相关推荐

热门问题

热门文章