想从href中提取文本,似乎我只能从HTML中提取整个href
from bs4 import BeautifulSoup
soup=BeautifulSoup("""<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
<div class="cdAllIn"><a href="/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0" title="All Odds"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div>
""",'html.parser')
lines=soup.find_all('a')
for line in lines:
print(line['href'])
结果:
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
/footba/all.aspx?lang=EN&tmatchid=6be0690b-93e3-4300-87e9-7d0aa5797ae0
预期结果:
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
6be0690b-93e3-4300-87e9-7d0aa5797ae0
因为您只需要检索tmatchid值,所以在url中找到子字符串tmatchid=,并从该索引中提取剩余的url
输出
使用
=
分割字符串并获取最后一个索引。你知道吗希望这有帮助!干杯!你知道吗
相关问题 更多 >
编程相关推荐