使用bs4从href中提取部分文本

from bs4 import BeautifulSoup soup=BeautifulSoup("""<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div> <div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div> <div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div> <div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div> """,'html.parser') lines=soup.find_all('a') for line in lines: print(line['href'])

/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0 /footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0 /footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0 /footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0

2条回答

网友

1楼 · 编辑于 2024-10-03 09:10:09

因为您只需要检索tmatchid值，所以在url中找到子字符串tmatchid=，并从该索引中提取剩余的url

lines=soup.find_all('a')
for line in lines:
    index=line['href'].find('tmatchid=')+9
    print(line['href'][index:])

输出

6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0

网友

2楼 · 编辑于 2024-10-03 09:10:09

使用=分割字符串并获取最后一个索引。你知道吗

for line in lines:
    print(line['href'].split('=')[-1])

希望这有帮助！干杯！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章