在Python中使用Pandas显示输出CSV

2024-05-20 14:09:57 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是我的代码

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
    headline = article.a.text
    summary=article.p.text
    link = "https://www.vanglaini.org" +article.a['href']
    #print(headline)
    #print(summary)
    #print(link)

#print()

news_csv = pd.DataFrame({'Headline': headline,
                         'Summary': summary,
                        'Link' : link,


                         })
print(news_csv)

我遇到了这个错误 标题=文章a.文本 AttributeError:“NoneType”对象没有属性“text”

救命啊!你知道吗


Tags: texthttpsorgimportsourcewwwarticlelink
1条回答
网友
1楼 · 发布于 2024-05-20 14:09:57

正如你已经在我的评论和@AmiTavory(deleted)答案中看到的一样-不是所有的文章都有链接,有时article.a会给出None,所以你有None.text会给出错误。你知道吗

必须检查article.a是否与None相似

import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all('article'):
    if article.a is None:
        continue        

    headline = article.a.text
    summary = article.p.text
    link = "https://www.vanglaini.org" + article.a['href']
    print(headline)
    print(summary)
    print(link)

而且很有效。你知道吗


编辑:您可以得到错误

raise ValueError("If using all scalar values, you must pass an index") ValueError: If using all scalar values, you must pass an index

因为完全不同的原因,你应该在新的页面上创建新的问题。你知道吗

这在DataFrame中是个问题,因为在headlinesummarylink中只有最后一个值,但DataFrame中需要列表

{
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
}

应该在for-loop之前创建空列表

list_with_headlines = []
list_with_summaries = []
list_with_links = []

for-循环中,您应该将append()值添加到列表中

list_with_headlines.append(headline)
list_with_summaries.append(summary)
list_with_links.append(link)

然后使用列表创建DataFrame

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

完整代码:

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

list_with_headlines = []
list_with_summaries = []
list_with_links = []

for article in soup.find_all('article'):
    if article.a is None:
        continue        
    headline = article.a.text.strip()
    summary = article.p.text.strip()
    link = "https://www.vanglaini.org" + article.a['href']
    list_with_headlines.append(headline)
    list_with_summaries.append(summary)
    list_with_links.append(link)

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

print(news_csv)

相关问题 更多 >