如何使用xpath同时从内部或外部获取文本？

product_prices_tds = response.xpath('//td/') product_prices = [] for td in product_prices_tds: if td.xpath('//span'): product_prices = td.xpath('//span/text()').extract() else: product_prices = td.xpath('//text()').extract() for n in range(len(product_names)): items['price'] = product_prices[n] yield items

product_prices_tds = response.xpath('//td') product_prices = [] for td in product_prices_tds: if td.xpath('span'): product_prices.append(td.xpath('span//text()').extract()) else: product_prices.append(td.xpath('/text()').extract()) for n in range(len(product_names)): items['price'] = product_prices[n] yield items

2条回答

网友

1楼 · 编辑于 2024-10-16 17:19:37

请参见下面的“用刮片编辑”

根据您的html代码，使用BeautifulSoup库，您可以通过以下方式获取信息：

from bs4 import BeautifulSoup

page = """<td><span="green">$33.99</span></td>
          <td>Out of stock</td>
            <td><span="green">$27.99</span></td>
            <td><span="green">$35.00</span></td>"""

soup = BeautifulSoup(page, features="lxml")
tds = soup.body.findAll('td') # get all spans

for td in tds:

    # if attribute span exist
    if td.find('span'):
        print(td.find('span').text)
    # if not, just print inner text (here it's out of stock)
    else:
        print(td.text)

输出：

$33.99
Out of stock
$27.99
$35.00

用刮痧：

import scrapy

page = """<td><span="green">$33.99</span></td>
          <td>Out of stock</td>
            <td><span="green">$27.99</span></td>
            <td><span="green">$35.00</span></td>"""

response = scrapy.Selector(text=page, type="html")
tds = response.xpath('//td')

for td in tds:

    # if attribute span exist
    if td.xpath('span'):
        print(td.xpath('span//text()')[0].extract())
    # if not, just print inner text (here it's out of stock)
    else:
        print(td.xpath('text()')[0].extract())

输出：

$33.99
Out of stock
$27.99
$35.00

网友

2楼 · 编辑于 2024-10-16 17:19:37

XPath解决方案（从2.0开始）（与之前发布的@piratefache的逻辑相同）：

for $td in //td 
return 
if ($td[span]) 
then
$td/span/data() 
else 
$td/data()

应用于

<root>
    <td>
        <span>$33.99</span>
    </td>
    <td>Out of stock</td>
    <td>
        <span>$27.99</span>
    </td>
    <td>
        <span>$35.00</span>
    </td>
</root>

 $33.99
 Out of stock
 $27.99
 $35.00

顺便说一句：<span="green">不是有效的XML。可能缺少属性@color或类似属性（？）

相关问题更多 >

编程相关推荐

热门问题

热门文章