我有一个CSV和一堆列。我只想要三根柱子。 我将其导入python脚本,并将三列转换为三个列表
然后将每个列表添加到字典中。列表1是键,其他列表是两个值。(也许有更好的方法吗?)你知道吗
key is a transaction id
value1 is a filename
value2 is a date
最终我们想要的是:
我想对所有键重复这个步骤,直到我基本上只得到列表中最新项的id(键)。文件名x可能有5个重复的文件名,文件名y可能有100个重复,文件名t可能有30个重复,以此类推。你知道吗
我正在使用一个API来实际移动数据,这就是为什么在这个外部系统中我需要获取最新的ID并将该ID移动到“x”,将所有重复的ID移动到“y”。你知道吗
以下是我在构建dict方面的经验(假设它的构建顺序正确),但我真的不知道接下来该怎么做:
import csv
def readcsv(filename, column):
file = open(filename, "rU")
reader = csv.reader(file, delimiter=",")
list = []
for row in reader:
list.append(row[(column)])
file.close()
return list
def makeDict(id, fileName, detDate):
iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
return (iList)
id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))
mainDict = makeDict((id), (fileName), (detDate))
样本数据(将列提取到一个更简单的表中进行测试)
Date fileURL ID
7/24/2018 16:04 https://localhost/file1.docx 2599302
7/24/2018 16:03 https://localhost/file3.docx 2349302
7/24/2018 16:01 https://localhost/file1.docx 2599302
7/24/2018 16:04 https://localhost/fil232.xml 2599303
7/24/2018 16:03 https://localhost/file1.docx 2349333
7/24/2018 16:01 https://localhost/file3.docx 2529374
更新: 从下面的答案来看,这就是我最终得出的结果,它使我成功了:
import csv
def readcsv(filename, column):
file = open(filename, "rU")
reader = csv.reader(file, delimiter=",")
list = []
for row in reader:
list.append(row[(column)])
file.close()
return list
def makeDict(id, fileName, detDate):
iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
return (iList)
## Group Keys by like file names ##
def groupKeys(mainDict):
same_filename = {}
for key, line in (mainDict).items():
name, date = line
if name not in same_filename:
same_filename[name] = [key]
else:
same_filename[name].append( key )
return(same_filename)
########################################### Get latest ID ##################
def getLatestID(same_filename, mainDict):
## for each file
for k in (same_filename.keys()):
curDate = 0
curID = 0
## get each id value (aka matching ids holding same file)
for v in (same_filename.get((k))):
moveDupeList.append(v) ## add to a list of dupes
## if current id's date is equal to the highest found so far - note:date already set since its same
if ((mainDict.get((v)))[1]) == (curDate):
## check which id is highest and set curId if new high found
if (v) > (curId):
curId = (v)
## else if date of current is greater than greatest found so far set new highest date and id
elif ((mainDict.get((v)))[1]) > (curDate):
curDate = ((mainDict.get((v)))[1])
curId = (v)
if (curId) in moveDupeList:
moveDupeList.remove((curId)) #remove latest from dupe list
moveProperList.append((curId)) #add latest to proper list
########################################### Get latest ID ##################
id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))
mainDict = makeDict((id), (fileName), (detDate))
same_filename = (groupKeys(mainDict))
getLatestID((same_filename), (mainDict))
一个起点可以是构建另一个字典,为每个文件名提供所有对应键(id)的列表:
这是你的第一点。你知道吗
相关问题 更多 >
编程相关推荐