截断某个字符前后的文本 - 问答 - Python中文网

截断某个字符前后的文本

2024-10-01 09:20:03 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在用python阅读大量的文本

文本格式为：

blablabla
***** END HEADER ******

valid content


***** start footer *****
blablalba

我需要删除所有文本中的页眉和页脚，方法是删除所有字符串直到******结束页眉****以及在******开始页脚****之后的所有内容

任何帮助都将不胜感激

我已经试过了：

import re

chop = re.compile('(/.+)*** END HEADER *****', re.DOTALL)

data_chopped = chop.sub('', text_file)

但我一直得到一个错误：

sre_constants.error: multiple repeat at position

Tags：方法文本格式文本 re content start end header

1条回答

网友

1楼 · 发布于 2024-10-01 09:20:03

可能还有其他有效的方法，其中一种方法是尝试使用多个拆分：

txt = """blablabla
***** END HEADER ******

valid content


***** start footer *****
blablalba
"""

# split the header and take the second section of split
tmp = ''.join(txt.split('***** END HEADER ******')[1])
# split by footer and take the first section of split
tmp2 = ''.join(tmp.split('***** start footer *****')[0])
result = tmp2.strip()
print(result)

结果：

'valid content'

相关问题更多 >

编程相关推荐

热门问题

热门文章