如何用Python从网站读取文本

网友

1楼 · 编辑于 2024-09-22 16:37:28

如果您喜欢jQuery，请使用pyQuery

从

from pyquery import PyQuery as pq

d = pq(web_pg)

甚至是

^{pr2}$

现在d就像jQuery中的$：

p = d("#hello") # get element with id="hello"
print p.html() # print as html

p = d('#content p:first') # get first <p> from element with id="content"
print p.text() # print as text

网友

2楼 · 编辑于 2024-09-22 16:37:28

from bs4 import BeautifulSoup
soup = BeautifulSoup(web_pg)

网友

3楼 · 编辑于 2024-09-22 16:37:28

我们在一段时间前开始使用BS，但最终转到了lxml

from lxml import html
my_tree = html.fromstring(web_pg)
elements = [item for item in my_tree.iter()]

所以现在你必须决定你想要的元素，你需要确保你保留的元素不是你决定要保留的其他元素的子元素

^{pr2}$

上面的html是div的子元素，所以表中的所有内容都包含在div中，所以您必须使用一些逻辑来只保留那些父元素尚未保留的元素

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何用Python从网站读取文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >