Python比较两个csv文件中的重复值,并将该行写入单独的csv-fi中

2024-10-04 05:21:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有2个.csv文件,其中包含数千行数据(来自供应商的产品库存)。我需要找到重复和删除价格较高的项目。在

问题是价格包含小数。以下代码是我根据需要完成的最接近的代码:

with open('vendor1.csv', 'r') as venOne, open('vendor2.csv', 'r') as venTwo, open('filtered.csv', 'w') as outFile:

    z = csv.reader(venOne, delimiter = ',')
    m = csv.reader(venTwo, delimiter = ',')
    w = csv.writer(outFile, delimiter = ',')

    zupc = {row[5] for row in z}    #UPC is in column 5
    mupc = {row[5] for row in m}

    zprice = {row[9] for row in z}  #Price is in column 9
    mprice = {row[7] for row in m}  #Price is in column 7

    for row in z:
        if row[5] in mupc and row[9] < mprice:
            w.writerow(row)
        else:
            if row[5] not in mupc:
                w.writerow(row)

    #Do the same for m

我正在使用Python 2.x。在

最后,这将使用cron作业来运行。所有数据都在远程共享服务器上。在

一个警告是我不能使用pandas(这将节省我编写其他各种脚本的大量时间)。唯一可用的导入模块是python的标准模块,添加附加模块是不可能的(也就是说,不需要花费更多的钱升级到专用服务器)。在


Tags: 模块csv数据代码inforisas
1条回答
网友
1楼 · 发布于 2024-10-04 05:21:33

首先,您可能应该使用dict,而不是{}。关于价格,你可以试着把它们投射到decimal。在

试试下面的代码,告诉我是否有帮助:

from decimal import Decimal

def write_cheaper_items(output, rows, this_prices, other_prices):
    for row in rows:
        upc = row[5]
        if upc not in other_prices or this_prices[upc] < other_prices[upc]:
            output.writerow(row)

with open('vendor1.csv', 'r') as venOne, open('vendor2.csv', 'r') as venTwo, open('filtered.csv', 'w') as outFile:
    z = csv.reader(venOne, delimiter = ',')
    m = csv.reader(venTwo, delimiter = ',')
    w = csv.writer(outFile, delimiter = ',')

    # these dicts will have the UPC as keys and their prices as values
    z_prices = {
        row[5]: Decimal(row[9])
        for row in z}
    m_prices = {
        row[5]: Decimal(row[7])
        for row in m}

    write_cheaper_items(w, z, z_prices, m_prices)
    write_cheaper_items(w, m, m_prices, z_prices)

相关问题 更多 >