使用Beauty Soup解析具有复杂结构的HTML

2024-09-28 20:45:51 发布

您现在位置:Python中文网/ 问答频道 /正文

对于noob的html抓取问题很抱歉,但我处理的是复杂的html,每种情况都是独一无二的

我正试图解析出前面的所有URL:{“actionType”:“navigate”,“actionUrl”:

在下面的例子中,它将是https://www.ABCD.com

我正在使用python。最好是漂亮的汤。关于如何处理的想法

</a>
<a aria-label="ABCD." class="we-lockup targeted-link l-column small-2 medium-3 large-2 we-lockup--shelf-align-top ember-view" data-metrics-click='{"actionType":"navigate","actionUrl":"https://www.ABCD.com","targetType":"card","targetId":"12345"}' data-metrics-location='{"locationType":"shelfCustomersAlsoBoughtMovie"}' href="https://www.ABCD.com" id="ember123"> <picture class="we-lockup__artwork we-artwork--lockup we-artwork--fullwidth we-artwork--vhs-movie-pic we-artwork ember-view" dir="ltr" id="ember123">
<noscript>

Tags: httpscomviewdatahtmlwwwclasswe
1条回答
网友
1楼 · 发布于 2024-09-28 20:45:51

您可以使用内置的json模块将数据转换为Python字典(dict),然后访问actionUrl

import json
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

data = soup.find(
    class_=
    'we-lockup targeted-link l-column small-2 medium-3 large-2 we-lockup shelf-align-top ember-view'
)['data-metrics-click']

json_data = json.loads(data)

print(type(json_data))
print(json_data['actionUrl'])

输出:

<class 'dict'>
https://www.ABCD.com

相关问题 更多 >