用Python检测换行符

import sys import fileinput import os import os.path import re # Descriptions: iterates over files in source directory, removes whitespace characters and saves to destination directory. print ('Source Directory:', str(sys.argv[1])) print ('Destination Directory:', str(sys.argv[2])) for i in os.listdir(sys.argv[1]): fullSource = (os.path.join(sys.argv[1], i)) fullDestination = (os.path.join(sys.argv[2], i)) newfile = open(fullDestination, "x") for line in fileinput.input(fullSource): matchObj = re.search('(?<!\r)\n', line) if matchObj: newfile.write(line.rstrip('\r\n')) else: newfile.write(line) newfile.close print ("created " + fullDestination)

2条回答

网友

1楼 · 编辑于 2024-09-24 16:34:41

好吧，这个结果并不奇怪。fileinput模块默认以文本模式打开文件，因此\r\n在单个\n中自动更改。因此正则表达式匹配每一行并删除所有的\n-这些\r已经被fileinput删除了。你知道吗

所以必须明确使用二进制打开模式。不幸的是，如果您使用python3.x（您的print语法所建议的），二进制模式将为您提供需要转换为字符串的字节。您的代码可能会变成：

import sys
import fileinput
import os
import os.path
import re

# Descriptions: iterates over files in source directory, removes whitespace characters and saves to destination directory.


print ('Source Directory:', str(sys.argv[1]))
print ('Destination Directory:', str(sys.argv[2]))

for i in os.listdir(sys.argv[1]):
    fullSource = (os.path.join(sys.argv[1], i))
    fullDestination = (os.path.join(sys.argv[2], i))
    newfile = open(fullDestination, "x")
    for line in fileinput.input(fullSource, mode='rb'):  # explicite binary mode
        line = line.decode('latin1')   # convert to string in Python3
        matchObj = re.search('(?<!\r)\n', line)
        if matchObj:
            newfile.write(line.rstrip('\r\n'))
        else:
            newfile.write(line)
    newfile.close
    print ("created " + fullDestination)

网友

2楼 · 编辑于 2024-09-24 16:34:41

您的正则表达式正确匹配的\n字符前面没有\r：

>>> re.search('(?<!\r)\n', 'abc\r')
>>> re.search('(?<!\r)\n', 'abc\r\n')
>>> re.search('(?<!\r)\n', 'abc\n')
<_sre.SRE_Match object; span=(3, 4), match='\n'>

你的if和write错了：

if matchObj:  # "If line ends with '\n'"
    # Won't strip anything, because line ends with '\n', not '\r\n'.
    newfile.write(line.rstrip('\r\n'))
else:
    newfile.write(line)

你可能想这样做：

if not matchObj:  # "If line ends with '\r\n'"
    # Note that strip('\r\n') removes these two characters, but does not add '\n' back.
    newfile.write(line.replace('\r\n', '\n'))
else:
    newfile.write(line)

顺便说一句，你不需要正则表达式来做你想做的事情，endswith()应该足够了：

if line.endswith('\r\n'):
    newfile.write(line.replace('\r\n', '\n'))
else:
    newfile.write(line)

实际上，replace()本身就足够了：

newfile.write(line.replace('\r\n', '\n'))

相关问题更多 >

编程相关推荐

热门问题

热门文章