将脚本升级到Python 3\r\n文本二进制模式（&T）

import re import sys import time with open('4 - raw.txt', 'rb') as content_file: content = content_file.read() newLinePos = [m.start() for m in re.finditer('\n', content)] for line in newLinePos: if (content[line-1]) != '\r': print (repr(content[line-20:line])) print ("end") time.sleep(1000)

2条回答

网友

1楼 · 编辑于 2024-10-03 00:28:26

如果要查找换行符/换行符（\n）字符的位置，而不是前面有回车符（\r），可以使用负lookback断言正则表达式。你知道吗

>>> lines = ['foo', 'ba\nr', 'baz', 'quux']
>>> content = '\r\n'.join(lines).encode('utf-8')
>>> content
b'foo\r\nba\nr\r\nbaz\r\nquux'
>>> pattern = b'(?<!\r)\n'
>>> newLinePos = [m.start() for m in re.finditer(pattern, content)]
>>> newLinePos
[7]
>>> content[5:8]
b'ba\n'

有几件事需要注意。你知道吗

content是bytes实例；当您以“rb”模式读取文件时，将得到字节。你知道吗
pattern需要是字节实例，因为正在搜索的序列是字节实例。你知道吗
如果括号（\n）后面的字符前面没有\r，则模式(?<!\r)\n匹配。请参阅re documentation以获取完整的描述。你知道吗

网友

2楼 · 编辑于 2024-10-03 00:28:26

python3明确区分了原始字节字符串和utf-8字符串。content[line-1]返回一个数字，可能是0-255字节，您试图将其与字符串'\r'匹配。我同意可能会进行转换，但是Python是强类型的，因此无论整数代表什么字符，转换都将失败。要获取对应于\r的byte数字，请使用：

(content[line-1]) != ord('\r')

类似地，使用字节字符串生成迭代器：

newLinePos = [m.start() for m in re.finditer(b'\n', content)]

相关问题更多 >

编程相关推荐

热门问题

热门文章