如何在item对象中正确存储刮取的数据,并将每组数据保存到一个csv文件中?

2024-09-29 00:13:23 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我让我的小蜘蛛工作得很好。我得到了所有的数据。我利用设置items.py来捕获我们想要的7条数据。我可以把数据写进文件里。。但现在的问题是,我需要按照items.py设置的顺序将数据捕获到1个输出文件中。似乎不知道如何创建文件,如果它不存在(网站名和日期,使它成为一个唯一的文件名)

这是我目前拥有的,但这会为每个被刮去的页面/url创建一个文件,我想将所有这些整合到我们刮去的每个站点的一个文件中

我不喜欢itmDetails2格式化数据的方式,除非这是唯一的方式,但我认为sine the dets[]是我的items.py值列表,我可以简单地将每个集合存储在其中,然后将dets[]写入csv

有谁能举出一个很好的例子来说明我要做什么?我找到了一个python/webscraping备忘单,并尝试了保存数据的示例,但没有成功

https://blog.hartleybrody.com/web-scraping-cheat-sheet/

            itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating']

            filename = 'dsg-%s.txt' % dets['description']

            with open(filename, 'w') as f:
                for its in itmDetails2:
                    f.write(str(its))

这是我的items.py文件,因为我捕获了每个报废循环的全部或大部分日期,我如何将每个集合作为逗号分隔的行写入csv

import scrapy

class Dsg2Item(scrapy.Item):
description = scrapy.Field()
sku = scrapy.Field()
price = scrapy.Field()
brand = scrapy.Field()
compurl = scrapy.Field()
reviewcount = scrapy.Field()
reviewrating = scrapy.Field()

Tags: 文件csv数据pyfield方式itemsdescription
2条回答

这里的问题是在打开文件时使用了w参数。这将导致文件每次被截断为零(内容被删除)。在Python中,属性与C standard library function fopen()相同

w Truncate to zero length or create text file for writing. The stream is positioned at the beginning of the file.

您应该改用a+,它将打开文件并将内容附加到末尾,而不是截断现有内容:

a+ Open for reading and writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subse- quent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.

示例:

itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating']

filename = 'dsg-%s.txt' % dets['description']

localLog = open(filename,"a+")
localLog.write(itmDetails2+"\r\n")
localLog.close()

我将进一步注意到,它每次创建一个新文件的原因,是因为您正在根据描述创建您的文件名。如果你想要一个文件名,就不要包含描述

示例:

itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating']

localLog = open("dsg-all.txt","a+")
localLog.write(itmDetails2+"\r\n")
localLog.close()

您可以使用csvwriter,或者在编写文件时使用append模式。 还有用于本地json存储的tinydb

相关问题 更多 >