Python脚本在读取数据时将数据移动到下一行

2024-10-02 08:30:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个代码,它计算一行中的分隔符数,如果一行中的分隔符数大于或小于每行的预期分隔符数,那么该行将被打印并复制到另一个文件(Lines\u file.txt)进行分析。例如:

1,a,b,c,d
2,e,f,g,h
3,r,h,,u,j

上面的第三行将被复制并粘贴到新文件中。脚本为:

import string

### PLEASE  DELETE THE FILE "Lines_FILE.txt" BEFORE RUNNING THIS SCRIPT

k = 0
linecount=0

with open('Mock.txt',encoding="latin1") as myfile:  #input file name with extension also if required update file encoding
    for line in myfile:
        k=0
        linecount=linecount+1
        words = line.split()
        for i in words:
            for letter in i:
                    #k=line.count('"|"')  #Unhash and Update delimiter and Text Qualifier if text qualifier present
                    k=line.count(',')    #Unhash and Update delimiter if no text qualifier
        print("Lines:",linecount)
        print(k)
        if(k!=94):  #Update the number of delimiters present in the first line or the expected delimiters per line.
            print(line)
            f = open("Lines_FILE.txt","a")
            f.write(line)

这是工作正常,但突然我注意到一个文件,脚本已经拿起了一行,这不是一个错误,并粘贴在行_文件.txt它。我注意到剧本中有一句台词 在Lines_FILE.txt文件中,一半的行被移到下一行,而在实际数据中,情况并非如此。这是台词:

10804395,1,10/4/2018 6:45:27 PM,742443,23,2122804,OCT-18,10/4/2018,P,10/4/2018 6:44:34 PM,742443,,,2779094.44,,2779094.44,Reclass since no Physical inventory with Sanmina    ,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
10804396,1,10/4/2018 6:45:27 PM,742443,23,2122805,OCT-18,10/4/2018,P,10/4/2018 6:44:35 PM,742443,,235530.26,,235530.26,,Fresh billing to Jabil against sanmina inventory movement reconciled to open POs from Jabil    ,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2

提取的线条看起来像:

10804395,1,10/4/2018 6:45:27 PM,742443,23,2122804,OCT-18,10/4/2018,P,10/4/2018 6:44:34 PM,742443,,,2779094.44,,2779094.44,Reclass since no Physical inventory with Sanmina
,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
10804396,1,10/4/2018 6:45:27 PM,742443,23,2122805,OCT-18,10/4/2018,P,10/4/2018 6:44:35 PM,742443,,235530.26,,235530.26,,Fresh billing to Jabil against sanmina inventory movement reconciled to open POs from Jabil
,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2

在“with Sanmina”和“Jabil”文本之后,该行被推到下一行。我注意到同样的模式,为数不多的线。我想这和课文后的间隙有关。 总而言之,在读取数据时,脚本会打断几行,并将它们视为错误行。作为python的新手,如果有人能在这个问题上指导我,那将是非常有帮助的


Tags: 文件intxtifwithlineopenoct
1条回答
网友
1楼 · 发布于 2024-10-02 08:30:11

原因可能是处理这两个文件的方式不同。第一个文件采用特定编码,第二个文件采用默认编码。我可以对你正在使用的脚本进行一些改进

line_no = 1
with open("Mock.txt", "r", encoding="latin1") as infile:
  with open("Lines_FILE.txt", "w", encoding="latin1") as outfile:
    for line in infile:
      delim_count = line.count(",")
      print("Line: ", line_no)
      if delim_count != 94:
        print(line)
        outfile.write(line)

这应该以相同的编码读取和写入文件

相关问题 更多 >

    热门问题