从一个u中提取表中的一行

from bs4 import BeautifulSoup import urllib2 html = urllib2.urlopen("http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=500180&expandable=0").read() soup=BeautifulSoup(html) table = soup.find('table',{'id' :'acr'}) #the below code wasn't working as I expected it to be tr = table.find('tr', text='EPS')

1条回答

网友

1楼 · 发布于 2024-10-01 07:43:24

文本位于td而不是tr中，因此使用文本获取td，然后调用.parent以获取tr：

In [12]: table = soup.find('table',{'id' :'acr'})

In [13]: tr = table.find('td', text='EPS').parent

In [14]: print(tr)
<tr><td class="TTRow_left" style="padding-left: 30px;">EPS</td><td class="TTRow_right">48.80</td>
<td class="TTRow_right">42.10</td>
<td class="TTRow_right">35.50</td>
<td class="TTRow_right">28.50</td>
<td class="TTRow_right">22.10</td>
</tr>
In [15]: [td.text for td in tr.select("td + td")]
Out[15]: [u'48.80', u'42.10', u'35.50', u'28.50', u'22.10']

你会看到它和页面上的完全匹配。你知道吗

另一种方法是调用查找下一个兄弟姐妹：

In [17]: tds = table.find('td', text='EPS').find_next_siblings("td")

In [18]: tds
Out[19]: 
[<td class="TTRow_right">48.80</td>,
 <td class="TTRow_right">42.10</td>,
 <td class="TTRow_right">35.50</td>,
 <td class="TTRow_right">28.50</td>,
 <td class="TTRow_right">22.10</td>]
In [20]: [td.text for td in tds]
Out[20]: [u'48.80', u'42.10', u'35.50', u'28.50', u'22.10']

相关问题更多 >

编程相关推荐

热门问题

热门文章

从一个u中提取表中的一行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >