在html选项卡的td节点下递归搜索文本

2024-09-30 20:17:17 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在用Python爬行html表。到目前为止，我已经成功地解析出了表：

root = etree.fromstring(browser.page_source, etree.HTMLParser())
rows = root.xpath("//table[@class='ms-listviewtable']/tbody/tr")

现在我想用for循环逐个解析每行中的列，如下所示：

for row in rows:
    cols = row.xpath("./td")
    texts = [col.xpath("./findtextforme()") for col in cols)]
    # findtextforme() is a imaginary functionality

为什么我不能简单地做col.xpath("./text()")或col.findtext("./")？因为他们放置文本的位置在该表的列之间甚至列内都不一致，包括td/text()、td/div/a/text()、td/div/font/text()、td/div/div/text()。。。等等

因此，我想要的东西可以递归地在给定的td节点下查找文本。我怎样才能做到这一点

Tags： text in 文本 div for html col root

1条回答

网友

1楼 · 发布于 2024-09-30 20:17:17

您可以使用^{}聚合HTML元素的“文本”：

Returns the text content of the element, including the text content of its children, with no markup.

texts = [col.text_content() for col in cols]

在html选项卡的td节点下递归搜索文本

相关问题更多 >

编程相关推荐

热门问题

热门文章

在html选项卡的td节点下递归搜索文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >