在python中使用键值用change file更新主文件

2024-10-01 15:49:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用相同布局的更改文件更新主文件。我想用更改文件中的键替换/附加主文件中的行记录。 两个输入文件都有重复项。主文件需要用匹配的记录更新,新记录应该附加到主文件中。你知道吗

输入文件将是相同的布局和“|”分隔和巨大的大小(25-40GB)。 你能帮帮我吗。你知道吗

示例-

  • 主文件:

钥匙1 | AAA | BBB | CCC

钥匙1 | AAA | BBB | DDD

键1 | XXX | YYY | ZZZ

键2 | ZZZ | YYY | 123

键2 | EEE | FFF | RRR

键3 | RRR | EEE | GGG

键3 | SSS | TTT | GGG

  • 更改文件:

钥匙1 | 111 | 222 | 333

钥匙1 | 222 | 333 | 444

钥匙4 | 888 | 333 | 222

钥匙4 | 888 | 777 | 222

  • 输出文件:

钥匙1 | 111 | 222 | 333

钥匙1 | 222 | 333 | 444

键2 | ZZZ | YYY | 123

键2 | EEE | FFF | RRR

键3 | RRR | EEE | GGG

键3 | SSS | TTT | GGG

钥匙4 | 888 | 333 | 222

钥匙4 | 888 | 777 | 222

sample data in image format


Tags: 文件fff记录布局钥匙bbbssszzz
1条回答
网友
1楼 · 发布于 2024-10-01 15:49:23

所以我试了一下,因为这听起来很有趣,我现在正在学习python。
请查找以下代码。
这适用于你的样品。 但是,如果主文件中有一个较大的后续键孔,则会混淆顺序。我没能修好它。你知道吗

我真的有很多问题的数据结构有重复的主键分布在几行。你知道吗

我不知道你到底在做什么,但我经常和数据库打交道,我可以告诉你,这种数据结构非常不寻常。如果重新构建数据集,您可能会受益匪浅。你知道吗

有了这些数据量,您可能会受益于将其存储在数据库中。如果你没有在上面运行深度学习算法。你知道吗

示例: 这是一个示例,其中它混淆了顺序,但不起作用

主文件

Key1|AAA|BBB|CCC
Key1|AAA|BBB|DDD
Key1|XXX|YYY|ZZZ
Key2|ZZZ|YYY|123
Key2|EEE|FFF|RRR
Key3|RRR|EEE|GGG
Key3|SSS|TTT|GGG
Key7|RRR|EEE|GGG
Key7|SSS|TTT|GGG

更改文件

Key1|111|222|333
Key1|222|333|444
Key1|222|333|555
Key4|888|333|222
Key4|888|777|222
Key5|888|333|222
Key5|888|777|222
Key6|888|333|222
Key6|888|777|222
Key8|888|333|222
Key8|888|777|222
Key9|888|333|222
Key9|888|777|222

代码:

import fileinput

with open('changefile.txt') as infile:
    keyindex = []
    for line in infile:
        linelist = line.strip().split("|") ## split line by |
        key = linelist[0] ## assign the key
        keyid = linelist[0][3:] ## assign keyid
        keylist = [] ## assign keylist for loop

        ## finding duplicate keys in changefile and assign them to list
        if key not in keyindex: ## we need this because multiple keys in multiple lines
            with open('changefile.txt') as infile2:
                #spawning extra loop for each new key to open and search all duplicate keys and assign them to list
                for line2 in infile2:
                    if line2.startswith(key):
                        print(line2)
                        keylist.append(line2)
            ## Delete line with current key of loop from master file
            keyindex.append(key)
            print(keylist)
            for linem in fileinput.input('test.txt', inplace=True):
                if key in linem:
                    continue
                print(linem, end='')
            ## insert keys from keyindex

            for linei in fileinput.input('test.txt', inplace=1):

                    if 'Key'+str(int(keyid)+1) in linei: ## This statement is case sensitive
                        for item in keylist:
                            print(item, end='')
                        keylist = []
                    print(linei, end='')


# I had problems with not beeing able to go to next line at the beginning of this code if you fix this, this would be better then opening the file anew
##                    if last in linei and keylist:
##                        ##print('\n')
##                        for item in keylist:
##                            print(item, end='')
##                        keylist = []
##                        print('\n')

## this block may cause problem with memory you may can fix this with the comment block before this.
## this block is for adding left over keys from the end of change file to the end of master e.g. id 9 is in changefile, but masterfile is only going to key8
            with open("test.txt", "a") as myfile:
                if keylist:
                    for item in keylist:
                            myfile.write(item)
                    keylist = []
                else:
                    continue




        ## because we spawned a seperate loop each time we find a new key, we can skip the duplicate lines                    
        else:
            ## print('>>>key '+line+'already worked at! go to next line') # if you want to skip, uncomment continue and comment this 
            continue
    ##print all keyindexes that have been changed
    print('Following keys have been changed:'keyindex)

相关问题 更多 >

    热门问题