<p><strong>请参见下面的“用刮片编辑”</strong></p>
<p>根据您的html代码,使用BeautifulSoup库,您可以通过以下方式获取信息:</p>
<pre class="lang-py prettyprint-override"><code>from bs4 import BeautifulSoup
page = """<td><span="green">$33.99</span></td>
<td>Out of stock</td>
<td><span="green">$27.99</span></td>
<td><span="green">$35.00</span></td>"""
soup = BeautifulSoup(page, features="lxml")
tds = soup.body.findAll('td') # get all spans
for td in tds:
# if attribute span exist
if td.find('span'):
print(td.find('span').text)
# if not, just print inner text (here it's out of stock)
else:
print(td.text)
</code></pre>
<p>输出:</p>
<pre><code>$33.99
Out of stock
$27.99
$35.00
</code></pre>
<p>用刮痧:</p>
<pre class="lang-py prettyprint-override"><code>import scrapy
page = """<td><span="green">$33.99</span></td>
<td>Out of stock</td>
<td><span="green">$27.99</span></td>
<td><span="green">$35.00</span></td>"""
response = scrapy.Selector(text=page, type="html")
tds = response.xpath('//td')
for td in tds:
# if attribute span exist
if td.xpath('span'):
print(td.xpath('span//text()')[0].extract())
# if not, just print inner text (here it's out of stock)
else:
print(td.xpath('text()')[0].extract())
</code></pre>
<p>输出:</p>
<pre><code>$33.99
Out of stock
$27.99
$35.00
</code></pre>