用beauthoulsoup从多个同名类中获取第一个类

2024-07-01 07:58:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从this page中提取信息,它的HTML如下所示。在

我试图提取第一个class="currentServers"中的文本(示例:我从这行<span class="currentServers">745,807</span>得到745807

问题是行中有两个类名为class="currentServers"的跨接。我想得到行的第一列中的值。在

HTML格式:

<tr class="player_count_row" style="">
                            <td align="right">
                                <span class="currentServers">745,807</span>
                            </td>
                            <td align="right">
                                <span class="currentServers">836,540</span>
                            </td>
                            <td width="20">&nbsp;</td>
                            <td>
                                <a class="gameLink" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:570,&quot;v6&quot;:1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" href="http://store.steampowered.com/app/570/">Dota 2</a>
                            </td>
                        </tr>

我觉得我离得很近,但我想不通。在

这就是我所尝试的:

^{pr2}$

输出如下:

[u'346110', u'http://store.steampowered.com/app/346110/', u'745,807']
[u'230410', u'http://store.steampowered.com/app/230410/', u'745,807']
[u'252950', u'http://store.steampowered.com/app/252950/', u'745,807']
[u'482730', u'http://store.steampowered.com/app/482730/', u'745,807']
[u'252490', u'http://store.steampowered.com/app/252490/', u'745,807']
[u'4000', u'http://store.steampowered.com/app/4000/', u'745,807']
[u'444090', u'http://store.steampowered.com/app/444090/', u'745,807']
[u'359550', u'http://store.steampowered.com/app/359550/', u'745,807']
[u'588430', u'http://store.steampowered.com/app/588430/', u'745,807']
[u'374320', u'http://store.steampowered.com/app/374320/', u'745,807']
[u'8930', u'http://store.steampowered.com/app/8930/', u'745,807']
[u'107410', u'http://store.steampowered.com/app/107410/', u'745,807']
[u'238960', u'http://store.steampowered.com/app/238960/', u'745,807']
[u'304930', u'http://store.steampowered.com/app/304930/', u'745,807']
[u'10', u'http://store.steampowered.com/app/10/', u'745,807']
[u'72850', u'http://store.steampowered.com/app/72850/', u'745,807']
[u'289070', u'http://store.steampowered.com/app/289070/', u'745,807']
[u'105600', u'http://store.steampowered.com/app/105600/', u'745,807']
[u'377160', u'http://store.steampowered.com/app/377160/', u'745,807']
[u'236390', u'http://store.steampowered.com/app/236390/', u'745,807']
[u'292030', u'http://store.steampowered.com/app/292030/', u'745,807']
[u'227300', u'http://store.steampowered.com/app/227300/', u'745,807']
[u'386360', u'http://store.steampowered.com/app/386360/', u'745,807']
[u'236850', u'http://store.steampowered.com/app/236850/', u'745,807']
[u'364360', u'http://store.steampowered.com/app/364360/', u'745,807']
[u'381210', u'http://store.steampowered.com/app/381210/', u'745,807']
[u'363970', u'http://store.steampowered.com/app/363970/', u'745,807']
[u'453480', u'http://store.steampowered.com/app/453480/', u'745,807'

... ... ...

我要提取其周围有红色椭圆的值:

(appid,Current Players,Game Name)-我可以成功地获得每个游戏的appid和游戏名,但不能按顺序获取当前玩家

enter image description here


Tags: storerightcomapphttphtmlthistr
3条回答

我会尝试抓取每一行,然后像这样抓取.currentServers的第一个实例。在

rows = soup.find_all(class_='player_count_row')
for row in rows:
    print row.find(class_='currentServers').text

我通过修改以下代码来解决这个问题:

links = soup.findAll("a", { "class" : "gameLink" })
    currentPlayers = soup.findAll("span", {"class" : "currentServers"})

    players = ""

    rows = soup.findAll("tr", { "class" : "player_count_row" })    

    for row in rows:
        players = row.findAll("span", { "class" : "currentServers" })[0].text
        for link in links:
                try:
                    appid = link.get('onmouseover')
                    appid = findAppIdFromStats(appid,'"id":' , ',"public":1')
                    linkg = link.get('href')
                except AttributeError:
                    r.append(["N/A","N/A","N/A"])  
        r.append([appid,linkg,players])   

你为什么要用两个循环?在

如果没有,你可以尝试一个循环,当循环链接时,找到前一个tr,然后找到第一个td,其中包含你想要的玩家编号。在

示例:

for link in links:
        players = currentPlayers[0].text
        try:
            appid = link.get('onmouseover')
            appid = findAppIdFromStats(appid,'"id":' , ',"public":1')
            linkg = link.get('href')
        except AttributeError:
            r.append(["N/A","N/A","N/A"])
        r.append([appid, linkg, link.find_previous("tr", class_="player_count_row").find("td").get_text(strip=True)])

相关问题 更多 >

    热门问题