更有效的方式来浏览.csv文件?

2024-10-01 15:43:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过几个字典解析一个in.CSV文件,在单独的.txt文件中使用两个列表,以便脚本知道它在寻找什么。这样做的目的是在.CSV文件中找到一行同时匹配单词和IDNumber,如果有匹配的,则拉出第三个变量。但是,代码运行得非常慢。有什么办法让它更有效率吗?在

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

for CurrentIDNumber in open(IDNumberList_filename).readlines():
    for CurrentWord in open(WordsOfInterest_filename).readlines():
        FoundCurrent = 0

        with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

            if FoundCurrent == 0:
                CurrentProportion=0
            else:
                CurrentProportion=1
                print('found')

Tags: 文件csvintxtfordictionaryopenfilename
3条回答

首先,考虑加载文件字典_individualwords.csv进入记忆。python字典的数据结构应该是这样。在

您正在打开CSV文件N次,其中N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt)。如果文件不是太大,可以通过将其内容保存到dictionarylist of lists来避免这种情况。在

与每次从IDS.txt中读一行新行时打开dictionary_WordsOfInterest.txt的方法相同

另外,您似乎正在从txt文件中查找任何可能的对组合(CurrentIDNumber,CurrentWord)。例如,您可以将id存储在一个集合中,而单词存储在另一个集合中,对于csv文件中的每一行,您可以检查id和单词是否都在各自的集合中。在

当您为.txt文件使用readlines时,您已经用它们构建了一个内存列表。您应该首先构建这些列表,它们只解析一次csv文件。比如:

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

numberlist = open(IDNumberList_filename).readlines():
wordlist =  open(WordsOfInterest_filename).readlines():

FoundCurrent = 0

with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        for CurrentIDNumber in numberlist:
            for CurrentWord in wordlist :

                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

                if FoundCurrent == 0:
                    CurrentProportion=0
                else:
                    CurrentProportion=1
                    print('found')

注意:未经测试

相关问题 更多 >

    热门问题