需要将文本文件的每五行左右分组为一个串联的lin

2024-09-25 08:33:59 发布

您现在位置:Python中文网/ 问答频道 /正文

举个例子,我有一个utf-8字典文本文件,如下所示:

iguanodont
primer
blindfolder
pseudosperm
chanciest
givers
renascent
lecanine
struth
unionizers
autoriser
interpunctuation
monophylies
approximativeness

我需要遍历,将每五行分组在一起(用空格分隔),然后吐出一个新的文本文件,如下所示:

iguanodont primer blindfolder pseudosperm chanciest
givers renascent lecanine struth unionizers
autoriser interpunctuation monophylies approximativeness

到目前为止,我有这个。我是新来的,很抱歉这太平庸了。先谢谢你。你知道吗

import io
dictionary = io.open("shuffled.txt", 'r')

Tags: 文本文件primergiversinterpunctuationchanciestautoriserstruthrenascent
3条回答
read_file_name = 'words.txt'
write_file_name = 'words_grouped.txt'

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
        Thanks Ned Batchelder 
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

f = open(read_file_name)
words = f.read()
f.close()

words = words.split("\n")

grouped = list(chunks(words,5))

f2 = open(write_file_name, 'w+')
f2.write(str(grouped))
f2.close()

不完全是你想要的,但相似。这将生成分组数据列表,然后将其转换为字符串并保存到文件中。你知道吗

输出:

[['iguanodont'、'primer'、'blindfolder'、'pseudomer'、'chanciest']、['givers'、'renasant'、'lecanine'、'struth'、'unionizers']、['autoriser'、'interpuncutation'、'monophylies'、'approximativersity'、'、'']]

除非您的输入文件太大,无法放入内存,否则将其读入一个列表并切片该列表是最简单的——3行左右:

allrows = list(io.open("shuffled.txt", 'r'))
byfive = [allrows[i:i+5] for i in range(0, len(allrows), 5)]
io.open('out.txt', 'w').writelines(' '.join(x)+'\n' for x in byfive])

当然,处理无界文件、在异常情况下确保关闭等等,您可能会变得更喜欢,但最好在可行的情况下保持简单,并且只有在保证的情况下才增加复杂性。你知道吗

# assumes Python 3.x
from itertools import zip_longest

INPUT = "shuffled.txt"
OUTPUT = "by_fives.txt"

# from itertools documentation,
# https://docs.python.org/3.4/library/itertools.html
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

def main():
    with open(INPUT) as wordfile, open(OUTPUT, "w") as result:
        wordlist = (line.strip() for line in wordfile)
        for fivewords in grouper(wordlist, 5, ""):
            result.write(" ".join(fivewords) + "\n")

if __name__ == "__main__":
    main()

相关问题 更多 >