使用python比较两个csv文件中的第一列并打印匹配项

3条回答

网友

1楼 · 编辑于 2024-10-03 06:28:35

My script is comparing lines but only finds a match when the entire line (including the frequencies and relative frequencies) matches exactly. I realize that that is because I'm finding the intersection between two entire sets but I have no idea how to do this differently.

这正是字典的用途：当你有一个单独的键和值（或者只有部分值是键时）。所以：

a_dict = {row[0]: row for row in alist}
b_dict = {row[0]: row for row in blist}

现在，您不能在字典上直接使用set方法。Python3在这里提供了一些帮助，但是您使用的是2.7。所以，你必须明确地写下：

^{pr2}$

或者：

matches = set(a_dict) & set(b_dict)

但实际上并不需要集合；您只需要在这里迭代它们。所以：

for key in a_dict:
    if key in b_dict:
        a_values = a_dict[key]
        b_values = b_dict[key]
        do_stuff_with(a_values[2], b_values[2])

作为一个补充说明，你真的不需要在一开始就建立列表，只是为了把它们变成集合，或者dicts。只需建立集合或指令：

a_set = set()
with open("ngrams.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        a_set.add(tuple(row))

a_dict = {}
with open("ngrams.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        a_dict[row[0]] = row

另外，如果你知道理解，这三个版本都迫切需要转换：

with open("ngrams.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    # Now any of these
    a_list = list(reader)
    a_set = {tuple(row) for row in reader}
    a_dict = {row[0]: row for row in reader}

网友

2楼 · 编辑于 2024-10-03 06:28:35

您可以将第一个文件中的相对频率存储到字典中，然后迭代第二个文件，如果第一列与原始文件中的任何内容相匹配，则将结果直接写入输出文件：

import csv

tmp = {}

# if 1 file is much larger than the other, load the smaller one here
# make sure it will fit into the memory
with open("ngrams.csv", "rb") as fr:
    # using tuple unpacking to extract fixed number of columns from each row
    for txt, abs, rel in csv.reader(fr):
        # converting strings like "1.435486010883783160220299732E-8"
        # to float numbers
        tmp[txt] = float(rel)

with open("matchedngrams.csv", "wb") as fw:
    writer = csv.writer(fw)

    # the 2nd input file will be processed per 1 line to save memory
    # the order of items from this file will be preserved
    with open("ngramstest.csv", "rb") as fr:
        for txt, abs, rel in csv.reader(fr):
            if txt in tmp:
                # not sure what you want to do with absolute, I use 0 here:
                writer.writerow((txt, 0, tmp[txt] / float(rel)))

网友

3楼 · 编辑于 2024-10-03 06:28:35

在新文件中没有转储res（乏味）。第一个元素是短语，另外两个是频率。使用dict代替set一起进行匹配和映射。在

import csv
import io 

alist, blist = [], []

with open("ngrams.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        alist.append(row)
with open("ngramstest.csv", "rb") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist.append(row)

f_dict = {e[0]:e[1:] for e in alist}
s_dict = {e[0]:e[1:] for e in blist}

res = {}
for k,v in f_dict.items():
    if k in s_dict:
        res[k] = float(v[1])/float(s_dict[k][1])

print(res)

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python比较两个csv文件中的第一列并打印匹配项

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >