如何访问嵌套span标记中的数据

<div style="width:100%; display:inline-block; position:relative; text- align:center; border-top:thin solid #fff; background-image:linear- gradient(#333,#000);"> <div style="width:100%; max-width:1400px; display:inline-block; position:relative; text-align:left; padding:20px 15px 20px 15px;"> <a href="/manpower-fit-for-military-service.asp" title="Manpower Fit for Military Service ranked by country"> <div class="smGraphContainer"><img class="noBorder" src="/imgs/graph.gif" alt="Small graph icon"></div> </a> <span class="textLarge textWhite"><span class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span> </div> <div class="blockSheen"></div> </div>

for y in soup.find_all('span', class_ = "textBold"): print(y.text) #this gets FIT-FOR-SERVICE: for x in soup.find_all('span', class_ = "textLarge textWhite"): print(x.text) #this gets FIT-FOR-SERVICE: 18,740,382 but i only want the number

3条回答

网友

1楼 · 编辑于 2024-10-02 14:23:34

我相信你有两个选择：

1-在父span标记上使用regex只提取数字。你知道吗

2-使用decompose()函数从树中删除子span标记，然后提取文本，如下所示：

from bs4 import BeautifulSoup

h = """<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
    <div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
        <a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
            <div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
        </a>
        <span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
    </div>
    <div class="blockSheen"></div>
</div>"""

soup = BeautifulSoup(h, "lxml")
soup.find('span', class_ = "textLarge textWhite").span.decompose()
res = soup.find('span', class_ = "textLarge textWhite").text.strip()

print(res)
#18,740,382

网友

2楼 · 编辑于 2024-10-02 14:23:34

不必使用x.text获取文本，您可以使用x.find_all(text=True, recursive=False)来获取节点的所有顶级文本（在字符串列表中），而不必进入子节点。以下是使用您的数据的示例：

for x in soup.find_all('span', class_ = "textLarge textWhite"):
    res = x.find_all(text=True, recursive=False)
    # join and strip the strings then print
    print(" ".join(map(str.strip, res)))

#outputs: '18,740,382'

网友

3楼 · 编辑于 2024-10-02 14:23:34

以下是您的方法：

soup.find('span', {'class':'textLarge textWhite'}).find('span').extract()
output = soup.find('span', {'class':'textLarge textWhite'}).text.strip()

输出：

18,740,382

相关问题更多 >

编程相关推荐

热门问题

热门文章