Python：报纸模块用html标记提取文本

from newspaper import Article url = 'http://www.infomoney.com.br/mercados/acoes-e-indices/noticia/7345670/dow-jones-tem-nova-derrocada-puxa-ibovespa-para-segunda-semana' a = Article(url, language='pt') a.download() a.parse() print(a.text)

2条回答

网友

1楼 · 编辑于 2024-10-04 05:26:09

您可以通过html成员获取html。在

from newspaper import Article
url = 'http://www.infomoney.com.br/mercados/acoes-e-indices/noticia/7345670/dow-jones-tem-nova-derrocada-puxa-ibovespa-para-segunda-semana'
a = Article(url, language='pt')
a.download()
a.parse()
print(a.text)

html = a.html
print(html)

网友

2楼 · 编辑于 2024-10-04 05:26:09

这个问题是一年前提出的，但有人可能会通过谷歌找到这个问题。在

你可以用“a.article_html”获取文章文本中的图像和其他html。在

from newspaper import Article

a = Article('https://www.nytimes.com/2019/04/25/us/politics/joe-biden-anita-hill.html', 
    keep_article_html=True, 
    language='en')
a.download()
a.parse()

print(a.html) # This article's unchanged and raw HTML
print(a.article_html) # The HTML of this article's main node

记住参数“keep_article_html=True”

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：报纸模块用html标记提取文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >