在处理大文件时，如何快速得到一行中的多个列？

def amk(theLine, delimiter, columnList): ind = -1 for col in columnList: for _ in range(col): ind = theLine.find(delimiter, ind + 1) yield theLine[ind + 1: theLine.find(delimiter, ind + 1)] def columnListProcessor(columnList): columnList.sort(reverse=False) return [columnList[0]] + [columnList[i] - columnList[i - 1] for i in range(1,len(columnList))] # Let's use a random columns to process for here. # Amount of column can be more than 500 columnList = columnListProcessor([1, 3, 31, 232, 443, 514, 801, 1032, 1500, 2540, 2983, 3500, 4000, 4441, 4982]) with open("hugeFile.txt", "r") as theFile: theLine = theFile.readline() while theLine: for k in amk(theLine, "\t", columnList): if condition: foo() theLine = theFile.readline()

1条回答

网友

1楼 · 发布于 2024-10-02 18:22:04

如果您想要5000列中的500列，那么使用分隔符拆分所有列似乎更合适：

def amk(line, delimiter, column_list):
    split_line = line.split(delimiter)
    for col in column_list:
        yield split_line[col]

column_list = [1, 3, 31, 232, 443, 514, 801, 1032, 1500, 2540, 2983, 3500, 4000, 4441, 4982]

with open("hugeFile.txt", "r") as fobj:
    for line in fobj:
        for k in amk(line, "\t", column_list):
            print(k)

字符串的.split()方法是用C实现的。因此，它真的很快。即使使用.find()进行较少的搜索，也需要从Python多次调用它。与C中的一个函数（方法）调用相比，多个Python函数调用速度较慢。尽管方法.find()本身也是用C实现的，但是与调用.split()的次数相比，您需要从Python多次调用它

通常，您总是需要测量运行时间。通常情况下，对于您的用例，什么方法更快并不那么明显

相关问题更多 >

编程相关推荐

热门问题

热门文章