我想得到一些关于写这个网络垃圾程序的快速帮助。到目前为止,它是正确的刮取东西,但我有困难写入一个csv文件。在
每一个评论员和评论员的评语和评分
我想把复习成绩写在第一栏,把书面复习写在第二栏。但是,writerow只会逐行执行。在
谢谢你的帮助!:)
import os, requests, csv
from bs4 import BeautifulSoup
# Get URL of the page
URL = ('https://www.tripadvisor.com/Attraction_Review-g294265-d2149128-Reviews-Gardens_by_the_Bay-Singapore.html')
with open('GardensbytheBay.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
# Looping until the 5th page of reviews
for pagecounter in range(3):
# Request get the first page
res = requests.get(URL)
res.raise_for_status
# Download the html of the first page
soup = BeautifulSoup(res.text, "html.parser")
# Match it to the specific tag for all 5 ratings
reviewElems = soup.findAll('img', {'class': ['sprite-rating_s_fill rating_s_fill s50', 'sprite-rating_s_fill rating_s_fill s40', 'sprite-rating_s_fill rating_s_fill s30', 'sprite-rating_s_fill rating_s_fill s20', 'sprite-rating_s_fill rating_s_fill s10']})
reviewWritten = soup.findAll('p', {'class':'partial_entry'})
if reviewElems:
for row, rows in zip(reviewElems, reviewWritten):
review_text = row.attrs['alt'][0]
review2_text = rows.get_text(strip=True).encode('utf8', 'ignore').decode('latin-1')
writer.writerow([review_text])
writer.writerow([review2_text])
print('Writing page', pagecounter + 1)
else:
print('Could not find clue.')
# Find URL of next page and update URL
if pagecounter == 0:
nextLink = soup.select('a[data-offset]')[0]
elif pagecounter != 0:
nextLink = soup.select('a[data-offset]')[1]
URL = 'http://www.tripadvisor.com' + nextLink.get('href')
print('Download complete')
您可以使用熊猫数据帧:
您可以将评审分数和文本放在同一行但不同的列中:
您最初的方法是将每个项目作为一个单独的行,并连续地编写它们,这不是您想要的。在
相关问题 更多 >
编程相关推荐