Regex区分Windows和Linux的行尾字符

import regex winpattern = regex.compile("[(?m)[\r][\n]$",regex.DEBUG|regex.MULTILINE) linuxpattern = regex.compile("^*.[^\r][\n]$", regex.DEBUG) for i, line in enumerate(open('file8.py')): for match in regex.finditer(linuxpattern, line): print 'Found on line %s: %s' % (i+1, match.groups())

1条回答

网友

1楼 · 发布于 2024-09-28 20:20:31

当以文本文件的形式打开文件时，Python默认使用通用换行模式（请参见PEP 278），这意味着它将三个换行类型\r\n、\r和{}全部转换为\n。这意味着您的正则表达式是无关的：当您读取文件时，您已经丢失了有关换行符类型的信息。在

要禁用换行符转换，应将newline=''参数传递给^{}（对于python<；3，使用^{}）：

$ echo 'Hello
> World
> ' > test.unix
$ cp test.unix test.dos
$ unix2dos test.dos
unix2dos: converting file test.dos to DOS format...
$ python3
Python 3.5.3 (default, Nov 23 2017, 11:34:05) 
[GCC 6.3.0 20170406] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> unix = open('test.unix', newline='').read()
>>> dos = open('test.dos', newline='').read()
>>> unix
'Hello\nWorld\n\n'
>>> dos
'Hello\r\nWorld\r\n\r\n'

之后，这些正则表达式将起作用：

^{pr2}$

注意，当使用re.MULTILINE时，$匹配换行符之前的右，并且只匹配没有它的字符串结尾。要正确匹配任何换行符，只需删除$。在

如果要匹配完整行的正则表达式，请使用如下方法：

>>> unix_lines = re.compile(r'^(.*[^\r\n]\n|\n)', re.MULTILINE)
>>> dos_lines = re.compile(r'^.*\r\n', re.MULTILINE)
>>> unix_lines.findall(dos)
[]
>>> unix_lines.findall(unix)
['Hello\n', 'World\n', '\n']
>>> dos_lines.findall(unix)
[]
>>> dos_lines.findall(dos)
['Hello\r\n', 'World\r\n', '\r\n']

相关问题更多 >

编程相关推荐

热门问题

热门文章