在下面的示例中，如何使用BeauifulSoup解析数据？

<tr> <td class="num cell-icon-string" data-sort-value="6"> <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a></td> </tr> <tr> <td class="num cell-icon-string" data-sort-value="6"> <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a><br> <small class="aside">Mega Charizard X</small></td> </tr>

#!/usr/bin/env python3 from bs4 import BeautifulSoup soup = BeautifulSoup(open("data.html"), "lxml") poke_boxes = soup.findAll('a', attrs = {'class': 'ent-name'}) for poke_box in poke_boxes: poke_name = poke_box.text.strip() print(poke_name)

2条回答

网友

1楼 · 编辑于 2024-09-28 14:15:54

您需要更改逻辑以遍历行并检查是否存在小元素，如果它确实打印出该文本，则按现在的方式打印定位文本。你知道吗

soup = BeautifulSoup(html, 'lxml')
trs = soup.findAll('tr')
for tr in trs:
    smalls = tr.findAll('small')
    if smalls:
        print(smalls[0].text)
    else:
        poke_box = tr.findAll('a')
        print(poke_box[0].text)

网友

2楼 · 编辑于 2024-09-28 14:15:54

import bs4
html = '''<tr>
    <td class="num cell-icon-string" data-sort-value="6">
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a></td>

</tr>

<tr>
    <td class="num cell-icon-string" data-sort-value="6">
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a><br>
    <small class="aside">Mega Charizard X</small></td>
</tr>'''
soup = bs4.BeautifulSoup(html, 'lxml')

在：

[tr.get_text(strip=True) for tr in soup('tr')]

输出：

['Charizard', 'CharizardMega Charizard X']

您可以使用get_text()来连接标记中的所有文本，strip=Ture将删除字符串中的所有空间

相关问题更多 >

编程相关推荐

热门问题

热门文章