使用时需要数字部分响应.css发痒

2024-07-02 12:12:08 发布

您现在位置:Python中文网/ 问答频道 /正文

需要从这个页面获取产品名称和价格“http://www.fabfurnish.com/Koryo-KLE40DLBH1-39-inches-HD-Ready-LED-TV-Black-294567.html”。我知道了产品名称,但没有得到价格。在

item["Product_Name"] = response.css("#product_name::text").extract()[0]
item["Price"] = response.xpath("#price_box::text").extract()[0]

因此输出应为: 产品名称:高丽KLE40DLBH1 39英寸高清LED电视黑色(这款) 价格:22990(这个我买不到)


Tags: textcomhttpledresponsewwwextract价格
1条回答
网友
1楼 · 发布于 2024-07-02 12:12:08

对于price,您在.xpath()调用中使用CSS选择器,它需要一个XPath表达式。运行此操作将触发一个异常,该异常可能显示在您的日志中。在

因此,将.xpath()更改为.css()作为价格值:

$ scrapy shell http://www.fabfurnish.com/Koryo-KLE40DLBH1-39-inches-HD-Ready-LED-TV-Black-294567.html
2016-12-15 11:25:01 [scrapy] INFO: Scrapy 1.2.2 started (bot: scrapybot)

>>> response.css("#product_name::text").extract()
[u'Koryo KLE40DLBH1 39 inches HD Ready LED TV Black']
>>> response.css("#product_name::text").extract_first()
u'Koryo KLE40DLBH1 39 inches HD Ready LED TV Black'


>>> response.xpath("#price_box::text").extract()[0]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/http/response/text.py", line 115, in xpath
    return self.selector.xpath(query)
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/parsel/selector.py", line 207, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/parsel/selector.py", line 203, in xpath
    **kwargs)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:57924)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:167085)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:166044)
ValueError: XPath error: Invalid expression in #price_box::text
>>> response.css("#price_box::text").extract()[0]
u'26,990'
>>> response.css("#price_box::text").extract_first()
u'26,990'

注意使用.extract_first(),它通常比.extract()[0]安全(当选择器没有结果时,它会中断)

相关问题 更多 >