将提取的数据作为新列添加到现有csvpython中

2024-09-27 00:15:46 发布

您现在位置:Python中文网/ 问答频道 /正文

任何人都可以帮忙。请指出我错的地方,当摘录的评论被写进3个单独的专栏中酒店评论.csv,如何修复此问题以便将它们写入1列?以及如何根据下面的代码为它添加标题名“review”。 我还想将新提取的数据(“review”列)添加到现有的csv'hotel中_福特沃思.csv'. 我只是把提取的信息添加到一个新的csv中,我不知道如何将两个文件组合在一起或其他任何方式?url可以重复以匹配评论。拜托! 谢谢您!你知道吗

文件'酒店_福特沃思.csv'有3列,例如:

           Name                         link
1    Omni Fort Worth Hotel     https://www.tripadvisor.com.au/Hotel_Review-g55857-d777199-Reviews-Omni_Fort_Worth_Hotel-Fort_Worth_Texas.html
2    Hilton Garden Hotel       https://www.tripadvisor.com.au/Hotel_Review-g55857-d2533205-Reviews-Hilton_Garden_Inn_Fort_Worth_Medical_Center-Fort_Worth_Texas.html
3......
...

我使用现有csv的url来提取评论,代码如下所示:

import requests
from unidecode import unidecode
from bs4 import BeautifulSoup
import pandas as pd    

file = []
data = pd.read_csv('hotel_FortWorth.csv', header = None)
df = data[2]

for url in df[1:]:
    print(url)
    thepage = requests.get(url).text
    soup = BeautifulSoup(thepage, "html.parser")
    resultsoup = soup.find_all("p", {"class": "partial_entry"})
    file.extend(resultsoup)

    with open('hotelreview.csv', 'w', newline='') as fid:
    for review in file:
        review_list = review.get_text()
        fid.write(unidecode(review_list+'\n'))

预期结果:

    name          link         review
1   ...           ...         ...
2
....

Tags: 文件csv代码importurlhtml评论酒店
1条回答
网友
1楼 · 发布于 2024-09-27 00:15:46

您可以选择创建新的CSV。你知道吗

例如:

import requests
from unidecode import unidecode
from bs4 import BeautifulSoup
import pandas as pd

data = pd.read_csv('hotel_FortWorth.csv')
review = []
for url in data["link"]:
    print(url)
    thepage = requests.get(url).text
    soup = BeautifulSoup(thepage, "html.parser")
    resultsoup = soup.find_all("p", {"class": "partial_entry"})
    review.append(unidecode(resultsoup))
data["review"] = review
data.to_csv('hotelreview.csv')

相关问题 更多 >

    热门问题