使用Python和BeautifulSoup对一个范围进行抓取不会返回任何结果

def manchete_11112011_30102012(b): soup = make_soup(b) data = [span.string for span in soup.find("font")] noticias = [b.text for b in soup.findAll("a")] return {"noticias": noticias, "data": data}

2条回答

网友

1楼 · 编辑于 2024-09-21 03:26:43

如果你只想要日期，你应该在其他地方找。如果你把汤倒出来，然后搜索2012年，你会在很多地方看到它。用下面的代码很容易把它从标题中去掉。在

url = "http://www1.folha.uol.com.br/fsp/mercado/index-20121030.shtml"
page=urllib.request.urlopen(url)
soup = BeautifulSoup(page.read())
theDateTag = soup.find("title")
theDateString = theDateTag.get_text()
print(theDateString)

网友

2楼 · 编辑于 2024-09-21 03:26:43

要找到id = spanLongDate，请使用以下片段

//get the span you are looking for
span = soup.find("span", attrs = {"id":"spanLongDate"}) 

//get the text out of the span
data = span.get_text()

请注意，如果必须找到多个实例请使用.find_all

预计到达时间：

根据你下面的评论，我去查看了页面源代码，甚至在我的机器上运行了它。这里有一个函数，可以让您转储beauthoulsoup看到的内容。这很有帮助，因为有时在浏览器中查看源代码时，它看不到您看到的内容。在

^{pr2}$

当我把它打印出来并搜索“spanLongDate”时，我得到了以下感兴趣的片段。在

<td align="right" width="430"><font size="1"><span id="spanLongDate"></span></font><img alt="Mercado" hspace="10" src="images/mercado.gif"/></td>

这里面没有圣保罗的文字。然后我在Chrome浏览器中点击F12找到原始源代码，spanLongDate<div>中也没有文本。在

也许页面更新了？在

相关问题更多 >

编程相关推荐

热门问题

热门文章