从作为.CSV数据文件读取的数据中删除中间字段CRLF,替换为“||”

2024-06-02 10:29:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用python 3.8.2在windows操作系统中读取逗号分隔的数据行

某些字段在中间嵌入了CRLF,如此特定记录,其中包含多行数据,例如:

“John SmithCRLFJaneDoe”

源中的数据输入为:

"John Smith
Jane Doe"

当我读到这一行并想将其转换为字符串时,我得到str=“John Smith”,因为读解析器在CRLF处截断

因此,我试图用其他角色替换中途CRLF:

with open('bogus_line.csv', 'r') as MyLine:
    str = MyLine.read()
    print (str)  
    raw_string = str.replace('\\r\\n'," || ")
    print (raw_string)   # the problem is is generating the correct raw string format???

产生:

"John Smith
Jane Doe"
"John Smith
Jane Doe"

但我想:

"John Smith
Jane Doe"
"John Smith || Jane Doe"

以下操作不起作用,返回错误:

with open('bogus_line.csv', 'r') as MyLine:
    str = MyLine.read()
    print (str)  
    raw_string = r'str.replace('\\r\\n'," || ")
    print (raw_string)

您可能希望下面的代码能够正常工作,但它会像第一个示例中那样,完成但不合并两行:

with open('bogus_line.csv', 'r') as MyLine:
    str = MyLine.read()
    print (str)  
    raw_string = r"{}".format(str).replace('\\r\\n'," || ")
    print (raw_string)

产生:

"John Smith
Jane Doe"
"John Smith
Jane Doe"

Tags: 数据stringrawwithlineopenjohnbogus
2条回答

我设法做到了,但有点棘手。您必须分别删除\n和\r!不要问我为什么或者如何。但这似乎适用于Windows10平台:首先删除换行符,替换为空格;下一步,拆下回车框

raw_string = str.replace('\n', ' ').replace('\r', '')

通常的方法是使用csv模块,该模块了解引号字段中嵌入的换行符:

import csv

with open('bogus_line.csv', 'r') as MyLine:
    rd = csv.reader(MyLine)
    str = next(rd)[0]    # a reader is an iterator on lists of fields
    print (str)  
    raw_string = r"{}".format(str).replace('\\r\\n'," || ")
    print (raw_string)

相关问题 更多 >