处理速度编辑大2GB文本文件python

import io import os import sys newData = "" i=0 run=0 j=0 k=1 m=2 n=3 seqFile = open('temp100.txt', 'r') seqData = seqFile.readlines() while i < 14371315: sLine1 = seqData[j] editLine2 = seqData[k] sLine3 = seqData[m] editLine4 = seqData[n] tempLine1 = editLine2[0:20] tempLine2 = editLine4[0:20] newLine1 = editLine2.replace(editLine2, tempLine1) newLine2 = editLine4.replace(editLine4, tempLine2) newData = newData + sLine1 + newLine1 + '\n' + sLine3 + newLine2 if len(seqData[k]) > 20: newData += '\n' i=i+1 run=run+1 j=j+4 k=k+4 m=m+4 n=n+4 print(run) seqFile.close() new = open("new_100temp.txt", "w") sys.stdout = new print(newData)

3条回答

网友

1楼 · 编辑于 2024-09-26 22:49:53

如果一次只读取4行并处理这些行（未经测试），可能会快得多：

with open('100temp.txt') as in_file, open('new_100temp.txt', 'w') as out_file:
    for line1, line2, line3, line4 in grouper(in_file, 4):
         # modify 4 lines
         out_file.writelines([line1, line2, line3, line4])

其中grouper(it, n)是一个一次生成iterabel it项的函数。它是作为itertools模块的examples之一给出的（另请参见this anwerat SO）。以这种方式迭代文件类似于在文件上调用readlines()，然后手动迭代得到的列表，但它一次只向内存中读入几行。在

网友

2楼 · 编辑于 2024-09-26 22:49:53

您正在处理内存中的两个文件（输入和输出）。如果文件太大（分页），可能会导致时间问题。Try（伪代码）

Open input file for read
Open output file for write
Initialize counter to 1
While not EOF in input file
    Read input line
    If counter is odd 
        Write line to output file
    Else
        Write 20 first characters of line to output file
    Increment counter
Close files

网友

3楼 · 编辑于 2024-09-26 22:49:53

这里最大的问题似乎是一次读取整个文件：

seqData = seqFile.readlines()

相反，您应该首先打开源文件和输出文件。然后在第一个文件上进行迭代，并根据需要操作行：

outfile = open('output.txt', 'w')
infile = open('input.txt', 'r')

i = 0
for line in infile:
    if i % 2 == 0:
       newline = line
    else:
       newline = line[:20]

    outfile.write( newline )
    i += 1

outfile.close()
infile.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章