删除换行返回

2024-09-28 21:58:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我想删除换行到一定宽度的文本的换行符。e、 g

import re
x = 'the meaning\nof life'
re.sub("([,\w])\n(\w)", "\1 \2", x)
'the meanin\x01 \x02f life'

我想返回the meaning of life。我做错什么了?你知道吗


Tags: ofthe文本importre宽度x01life
2条回答

你需要这样转义\

>>> import re
>>> x = 'the meaning\nof life'

>>> re.sub("([,\w])\n(\w)", "\1 \2", x)
'the meanin\x01 \x02f life'

>>> re.sub("([,\w])\n(\w)", "\\1 \\2", x)
'the meaning of life'

>>> re.sub("([,\w])\n(\w)", r"\1 \2", x)
'the meaning of life'
>>>

如果不转义,则输出为\1,因此:

>>> '\1'
'\x01'
>>> 

这就是为什么我们需要使用'\\\\'r'\\'在Python正则表达式中显示信号\。你知道吗

但是关于这个,从this answer

If you're putting this in a string within a program, you may actually need to use four backslashes (because the string parser will remove two of them when "de-escaping" it for the string, and then the regex needs two for an escaped regex backslash).

the document

As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python's usage of the same character for the same purpose in string literals.

Let's say you want to write a RE that matches the string \section, which might be found in a LaTeX file. To figure out what to write in the program code, start with the desired string to be matched. Next, you must escape any backslashes and other metacharacters by preceding them with a backslash, resulting in the string \\section. The resulting string that must be passed to re.compile() must be \\section. However, to express this as a Python string literal, both backslashes must be escaped again.


正如brittenb所建议的,在这种情况下,您不需要RegEx:

>>> x = 'the meaning\nof life'
>>> x.replace("\n", " ")
'the meaning of life'
>>> 

使用原始字符串文字;Python字符串文字语法和regex都解释反斜杠;\1在Python字符串文字中解释为八进制转义,但在原始字符串文字中不解释:

re.sub(r"([,\w])\n(\w)", r"\1 \2", x)

另一种方法是将所有反斜杠加倍,这样它们就可以到达regex引擎。你知道吗

请参见Python regex HOWTO的Backslash plague section。你知道吗

演示:

>>> import re
>>> x = 'the meaning\nof life'
>>> re.sub(r"([,\w])\n(\w)", r"\1 \2", x)
'the meaning of life'

使用换行符拆分可能更容易;使用^{} method,然后使用^{}重新连接空格:

' '.join(ex.splitlines())

但无可否认,这并不能区分单词之间的新行和其他地方的额外新行。你知道吗

相关问题 更多 >