python在<br/>和<br/>之间替换url的内容

2024-09-28 17:18:22 发布

您现在位置:Python中文网/ 问答频道 /正文

有这样一个字符串:

<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>

我要删除内容:

https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0

保持:

<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>

我的代码:

mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'

怎么做?你知道吗


Tags: ofhttpsbrcomnetgooglebetv
3条回答

您可以从regex模块使用re.sub

import re
mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
print(re.sub(r'http[^<]+', '', mystring))

输出:

<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>

可以使用regex replace执行此操作:

查找:<br/>https?://[^<]*</br>

替换:<br/></br>

mystring = '<p>Millions of people watch TV.</p><br/>https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0<br/><p>Good boy!</p><br/>'
# remove 'https://sites.google.com/aaa-net.bb.cc/be-do-have/%E3%83%9B%E3%83%BC%E3%83%A0'
resultstring = '<p>Millions of people watch TV.</p><br/><br/><p>Good boy!</p><br/>'

length = len(mystring)
startPos = -1
endPos = -1
for i in range(length):
    subString = mystring[i:]
    if subString.startswith('<br/>'):
        if(startPos == -1):
            startPos = i
            continue # check from next character to get endPos

        if(endPos == -1):
            endPos = i


firstSubString = mystring[:startPos + 5] # 5 = the characher size of '<br/>'
lastSubString = mystring[endPos:]


completeResult = firstSubString + lastSubString
print(completeResult, completeResult == resultstring)
print(completeResult, resultstring)

相关问题 更多 >