使用正则表达式格式化文档

Authority: soaplab.icapture.ubc.ca - EMBOSS seqret program. The sequence source is USA (Uniform Sequence Address). This means that you pass in a database name as the namespace.and an entry from the db as the id, e.g.db = embl and id = X13776. The input format is swiss.The output format is fasta.

Input - The input format is swiss.The output format is fasta. Output - The input format is swiss. The output format is fasta. Input - from the db as the id, e.g.db = embl and id = X13776. Output - from the db as the id, e.g. db = embl and id = X13776.

1条回答

网友

1楼 · 发布于 2024-10-03 13:26:42

可以在re.sub函数中使用捕获组或基于正向前瞻的正则表达式

>>> import re
>>> s = '''The input format is swiss.The output format is fasta.
from the db as the id, e.g.db = embl and id = X13776.'''
>>> print(re.sub(r'\.([^.]{7,}\.)', r'. \1', s))
The input format is swiss. The output format is fasta.
from the db as the id, e.g. db = embl and id = X13776.

[^.]{7,}匹配任何字符，但不匹配点，7次或更多次。所以在两个点之间至少有7个字符

>>> print(re.sub(r'\.(?=[^.]{7,}\.)', r'. ', s))
The input format is swiss. The output format is fasta. 
from the db as the id, e.g. db = embl and id = X13776.

\.(?=[^.]{7,}\.)仅当后跟至少有7个字符的句子时才匹配点。如果是，则用点+空格替换匹配的点

>>> print(re.sub(r'(?<=\.)(?=[^.]{7,}\.)', r' ', s))
The input format is swiss. The output format is fasta. 
from the db as the id, e.g. db = embl and id = X13776.

相关问题更多 >

编程相关推荐

热门问题

热门文章