擅长:python、mysql、java
<p>知道了<a href="https://stackoverflow.com/questions/28978362/scraping-a-website-with-clickable-content-in-python">source of the input data</a>并考虑到它是HTML,下面是一个涉及到<em>HTML解析器的解决方案,<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow noreferrer">^{<cd1>}</a>:</p>
<pre><code>soup = BeautifulSoup(input_data)
for row in soup.select('div#tab-growth table tr'):
for td in row.find_all('td', headers=re.compile(r'gr-eps')):
print td.text
</code></pre>
<p>基本上,对于“growth”表中的每一行,我们都在查找标题中带有<code>gr-eps</code>的单元格(“表的EPS%”部分)。它打印:</p>
^{pr2}$
<p><a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags">This is a good read</a>还有。在</p>