比较字典中的值,并根据值对每个值进行处理

2024-10-01 09:18:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个CSV和一堆列。我只想要三根柱子。 我将其导入python脚本,并将三列转换为三个列表

然后将每个列表添加到字典中。列表1是键,其他列表是两个值。(也许有更好的方法吗?)你知道吗

key is a transaction id
value1 is a filename
value2 is a date

最终我们想要的是:

  1. 运行dict并找到所有重复的文件名(将有多组重复的文件名)
  2. 对于每一组重复的文件名,找到一个具有最新(最新)日期值的id(键)(如果时间和日期相同,则为最高id(键))
  3. 打印最新日期的密钥(我只需要id)
  4. 对于每个其他副本打印“this is a duplicate”+(键)(同样只需要每个副本的id)

我想对所有键重复这个步骤,直到我基本上只得到列表中最新项的id(键)。文件名x可能有5个重复的文件名,文件名y可能有100个重复,文件名t可能有30个重复,以此类推。你知道吗

我正在使用一个API来实际移动数据,这就是为什么在这个外部系统中我需要获取最新的ID并将该ID移动到“x”,将所有重复的ID移动到“y”。你知道吗

以下是我在构建dict方面的经验(假设它的构建顺序正确),但我真的不知道接下来该怎么做:

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)

id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))

样本数据(将列提取到一个更简单的表中进行测试)

Date    fileURL ID
7/24/2018 16:04 https://localhost/file1.docx    2599302
7/24/2018 16:03 https://localhost/file3.docx    2349302
7/24/2018 16:01 https://localhost/file1.docx    2599302
7/24/2018 16:04 https://localhost/fil232.xml    2599303
7/24/2018 16:03 https://localhost/file1.docx    2349333
7/24/2018 16:01 https://localhost/file3.docx    2529374

更新: 从下面的答案来看,这就是我最终得出的结果,它使我成功了:

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)


## Group Keys by like file names ##
def groupKeys(mainDict):
    same_filename = {}
    for key, line in (mainDict).items():
     name, date = line
     if name not in same_filename:
       same_filename[name] = [key]
     else:
       same_filename[name].append( key )
    return(same_filename)



########################################### Get latest ID ##################
def getLatestID(same_filename, mainDict):
## for each file
    for k in (same_filename.keys()):
     curDate = 0
     curID = 0
 ## get each id value (aka matching ids holding same file)
     for v in (same_filename.get((k))):
      moveDupeList.append(v)   ## add to a list of dupes 

  ## if current id's date is equal to the highest found so far - note:date already set since its same
      if ((mainDict.get((v)))[1]) == (curDate):

    ## check which id is highest and set curId if new high found
       if (v) > (curId):
        curId = (v)

    ## else if date of current is greater than greatest found so far set new highest date and id
      elif ((mainDict.get((v)))[1]) > (curDate):
       curDate = ((mainDict.get((v)))[1])
       curId = (v)
     if (curId) in moveDupeList:
      moveDupeList.remove((curId))   #remove latest from dupe list
     moveProperList.append((curId))  #add latest to proper list
########################################### Get latest ID ##################


id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))
same_filename = (groupKeys(mainDict))
getLatestID((same_filename), (mainDict))

Tags: csvinidforifis文件名filename
1条回答
网友
1楼 · 发布于 2024-10-01 09:18:00

一个起点可以是构建另一个字典,为每个文件名提供所有对应键(id)的列表:

data = {2349302: ['7/24/2018 16:03', 'https://localhost/file3.docx'],
 2349333: ['7/24/2018 16:03', 'https://localhost/file1.docx'],
 2529374: ['7/24/2018 16:01', 'https://localhost/file3.docx'],
 2599302: ['7/24/2018 16:01', 'https://localhost/file1.docx'],
 2599303: ['7/24/2018 16:04', 'https://localhost/fil232.xml']}

similar_filename = {}
for key, line in data.items():
    date, name = line
    if name not in similar_filename:
        similar_filename[name] = [key]
    else:
        similar_filename[name].append( key )


similar_filename
>>> {'https://localhost/fil232.xml': [2599303],
 'https://localhost/file1.docx': [2599302, 2349333],
 'https://localhost/file3.docx': [2529374, 2349302]}

这是你的第一点。你知道吗

相关问题 更多 >