从字符串开头删除连续字符

网友

1楼 · 编辑于 2024-09-27 22:23:17

如果你是复制和粘贴网页文本而不是处理html，一些问题中提到的问题是不可避免的。但是，使用htmllib处理html（如下所示的相关行），您可以将c（它提供c）之类的项作为单位删除。[编辑：我现在看到htmllib已被弃用；我不知道合适的替代品，但相信它是HTMLParser。]

显示的行有点像

^ ^a^b^c^d^eStar Wars: Episode III Revenge of the Sith DVD commentary featuring George Lucas, Rick McCallum, Rob Coleman, John Knoll and Roger Guyett, [2005]

这行的html源代码是

<li id="cite_note-DVDcom-13">^ <a href="#cite_ref-DVDcom_13-0">a</a> <a href="#cite_ref-DVDcom_13-1">b</a> <a href="#cite_ref-DVDcom_13-2">c</a> <a href="#cite_ref-DVDcom_13-3">d</a> <a href="#cite_ref-DVDcom_13-4">e</a> Star Wars: Episode III Revenge of the Sith DVD commentary featuring George Lucas, Rick McCallum, Rob Coleman, John Knoll and Roger Guyett, [2005]</li>

网友

2楼 · 编辑于 2024-09-27 22:23:17

在正则表达式中使用字符类如何，即：

re.sub('^([a-z] )*', '', ...)

这将删除单个字母字符后面跟一个空格的任何数量的前导出现。在

网友

3楼 · 编辑于 2024-09-27 22:23:17

我可能会做这样的事情：

title = re.sub(r'^([a-z]\s)*', '', 'a b c d Wikipedia Reference')

和你现在的情况一样。然而，正如@joran beasley指出的，对于复杂的案件，你可能需要更聪明的东西。在

相关问题更多 >

编程相关推荐

热门问题

热门文章