如何在item对象中正确存储刮取的数据，并将每组数据保存到一个csv文件中？

https://blog.hartleybrody.com/web-scraping-cheat-sheet/ itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating'] filename = 'dsg-%s.txt' % dets['description'] with open(filename, 'w') as f: for its in itmDetails2: f.write(str(its))

import scrapy class Dsg2Item(scrapy.Item): description = scrapy.Field() sku = scrapy.Field() price = scrapy.Field() brand = scrapy.Field() compurl = scrapy.Field() reviewcount = scrapy.Field() reviewrating = scrapy.Field()

2条回答

网友

1楼 · 编辑于 2024-09-29 00:13:23

这里的问题是在打开文件时使用了w参数。这将导致文件每次被截断为零（内容被删除）。在Python中，属性与C standard library function fopen()相同

w Truncate to zero length or create text file for writing. The stream is positioned at the beginning of the file.

您应该改用a+，它将打开文件并将内容附加到末尾，而不是截断现有内容：

a+ Open for reading and writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subse- quent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.

示例：

itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating']

filename = 'dsg-%s.txt' % dets['description']

localLog = open(filename,"a+")
localLog.write(itmDetails2+"\r\n")
localLog.close()

我将进一步注意到，它每次创建一个新文件的原因，是因为您正在根据描述创建您的文件名。如果你想要一个文件名，就不要包含描述

示例：

itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']+","+ dets['compurl']+","+ dets['reviewcount']+","+ dets['reviewrating']

localLog = open("dsg-all.txt","a+")
localLog.write(itmDetails2+"\r\n")
localLog.close()

网友

2楼 · 编辑于 2024-09-29 00:13:23

您可以使用csvwriter，或者在编写文件时使用append模式。还有用于本地json存储的tinydb

相关问题更多 >

编程相关推荐

热门问题

热门文章