在Python中网络绘制新闻标题和内容的示例

import requests from lxml import html import pandas url = "http://www.cnbc.com/" response = requests.get(url) doc = html.fromstring(response.text) headlineNode = doc.xpath('//div[@class="headline"]') len(headlineNode) result_list = [] for node in headlineNode : url_node = node.xpath('./a/@href') title = node.xpath('./a/text()') soup = BeautifulSoup(url_node.content) text =[''.join(s.findAll(text=True)) for s in soup.findAll("div", {"class":"group"})] if (url_node and title and text) : result_list.append({'URL' : url + url_node[0].strip(), 'TITLE' : title[0].strip(), 'TEXT' : text[0].strip()}) print(result_list) len(result_list)

1条回答

网友

1楼 · 发布于 2024-09-29 01:37:45

剧本开篇不错。但是，soup = BeautifulSoup(url_node.content)是错误的。url_content是一个列表。您需要形成完整的新闻URL，使用请求获取HTML，然后将其传递给BeautifulSoup。在

除此之外，我还想看看以下几点：

我看到了进口问题，美的不是进口的。 {cd3>添加。你在用熊猫吗？如果没有，请将其拆下。
当您查询url_node = node.xpath('./a/@href')时，CNN上一些带有大横幅图片的新闻div将生成一个长度为0的列表。您还需要找到适当的逻辑和选择器来获取这些新闻url。我把这事交给你。

看看这个：

import requests
from lxml import html
import pandas
from bs4 import BeautifulSoup

# Note trailing backslash removed
url = "http://www.cnbc.com"
response = requests.get(url)
doc = html.fromstring(response.text)

headlineNode = doc.xpath('//div[@class="headline"]')
print(len(headlineNode))

result_list  = []
for node in headlineNode:
    url_node = node.xpath('./a/@href')
    title = node.xpath('./a/text()')
    # Figure out logic to get that pic banner news URL
    if len(url_node) == 0:
        continue
    else:
        news_html = requests.get(url + url_node[0])
        soup = BeautifulSoup(news_html.content)
        text =[''.join(s.findAll(text=True)) for s in soup.findAll("div", {"class":"group"})]
        if (url_node and title and text) :
            result_list.append({'URL' : url + url_node[0].strip(),
                                'TITLE' : title[0].strip(),
                                'TEXT' : text[0].strip()})
print(result_list)
len(result_list)

额外调试提示：

启动ipython3 shell并执行%run -d yourfile.py。查找ipdb和调试命令。检查变量是什么以及调用的方法是否正确非常有用。在

祝你好运。在

相关问题更多 >

编程相关推荐

热门问题

热门文章