读取python字符串文件会在添加到列表时更改格式。为什么？ - 问答 - Python中文网

读取python字符串文件会在添加到列表时更改格式。为什么？

2024-05-18 14:31:02 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

正在加载一个包含txt文件中存储的电话号码的文件。打印时加载的文件看起来不错。将文件写入列表。从列表中打印文件时，获取不同的编码类型，而不是字符串。将列表的内容写入新文件时，尽管剥离并确保UTF-8格式，但仍会获取不必要的\n和其他字符。你知道吗

original_file = open("original.txt", "r", encoding="UTF-8", errors="replace")
pl = []
for item in original_file:
    pl.append(item)
target_file = open("target.txt", "w", encoding="UTF-8")
for item in pl:
    target_file.write(item) # or .write("{}\n".format(item)) 
                            # neither gets me the desired new lin

电子

原始文件内容：

(248) 370-0000
(706) 862-2128
(863) 763-8632
(682) 404-0051
(734) 667-2877
...

加载到pl列表并打印项目时

for item in pl: print(item)

我明白了：

(248) 370-0000
(706) 862-2128
(863) 763-8632
(682) 404-0051
(734) 667-2877

但当我简单地写下列表名pl时，我得到了：

\x00“\x00（\x006\x00\x00\x00\x00\x00\x00（\x006\x006\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002\x00\x00\x00\x001\x001\x001\x001\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，x007\x00-\x003\x005\x000\x004\x00\n'，\x00“\x00（\x002\x002\x004\x004\x2004\xx00000000 \ x00\x00（\x002\x002\x004\x004\x004\x004\x2004\x000000000000\ x000000 \ xx00\x00\x00\x00\x00\x00\x00\x00\x00\x000\x00\x000\x000\x000\x000\x000\x000\x000\x000\x000\x000\x000\x000\x000\x000\x00\x00\x00 \x00\x00\x00\ \x00''，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，，\x00\x00（\x00（\x00（\x00\x00 \x00 \x004\x000\x004\x00-\x000\x000\x005\x001\x00\n'，'\x00（\x007\x003\x004\x00）\x00\x006\x006\x007\x00-\x002\x008\x007\x007\x00']

我之所以提出这个问题，是因为当我尝试从pl加载项目并将它们写入目标文件，而不是在新的文本文件中获取电话号码列表时，我得到了以下结果：

3.3英寸的7.3英寸的3.9英寸的9.9英寸的9.4英寸的9.4英寸的9.4英寸的9.4英寸的9.9英寸的9.4英寸的9.4英寸的4 1 1英寸的9.9英寸的9.4英寸的3.8英寸的4.4英寸的1 1英寸的2 1 1 1 0 0 0 0 0 0 4 4 4）2 2 2 2 1 1 1 1 1 6 6 6 6 2 2 2 0 0 0 0 0 0 0 0 0 0 0 9 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 8〃3〃7〃0〃-〃0〃0〃0〃（〃7〃0〃6〃）八、六、二、二、一、二、八、七、六、三、八、六、六、二、二、四、四、四、五、一、七、六、七、七、二、七、七、七、七、七、七、七、七、七、七、七、七、七、七、七、七

没有新台词。取而代之的是项目之间的空格。你知道吗

Tags：文件 txt 列表 item utf file pl x00

1条回答

网友

1楼 · 发布于 2024-05-18 14:31:02

您的原始文件编码为UTF-16，big endian。你知道吗

>>> bs = b'\x00(\x006\x001\x000\x00)\x00 \x003\x009\x002\x00-\x003\x001\x001\x005\x00\n'
>>> bs.decode('utf-8')
'\x00(\x006\x001\x000\x00)\x00 \x003\x009\x002\x00-\x003\x001\x001\x005\x00\n'
>>> bs.decode('utf-16-be')
'(610) 392-3115\n'

（在每个ascii字符之前出现空字节b'\x00'强烈暗示utf-16是编码）

像这样打开文件应该可以：

original_file = open("original.txt", "r", encoding="utf-16-be")

相关问题更多 >

编程相关推荐

热门问题

热门文章