Flipkart.com网站使用Python提取产品'price'和产品'title'

2024-10-06 13:22:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我编写了下面的Python代码来提取指定的项的价格flipkart.com网站在

import urllib2
import bs4
import re

item="Wilco Classic Library: Autobiography Of a Yogi (Hardcover)"
item.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query={0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(item)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
    response = urllib2.urlopen(r)
except:
    print "Internet connection error"  
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
firstBlockSoup = soup.find('div', attrs={'class': 'fk-srch-item'})
priceSoup=firstBlockSoup.find('b',attrs={'class':'fksd-bodytext price final-price'})
price=priceSoup.contents[0]
print price

titleSoup=firstBlockSoup.find('a',attrs={'class':'fk-srch-title-text fksd-bodytext'})
title=titleSoup.findAll('b')
print title

上面的代码在执行时打印的价格没有问题。在

^{pr2}$

但标题的获得方式如下:

[<b>Wilco</b>, <b>Classic</b>, <b>Library</b>, <b>Autobiography</b>, <b>Of</b>, <b>a</b>, <b>Yogi</b>, <b>Hardcover</b>] 

如果您查看一下product page(使用'Inspect element')的源代码,原因就显而易见了

现在,我如何以适当的格式提取标题以便打印:

Wilco Classic Library: Autobiography Of a Yogi (Hardcover)

Tags: ofimportaslibraryautosuggesturllib2itemprice
2条回答

firstBlockSoup标记获取标题会更容易:

>>> firstBlockSoup.attrs['data-item-name']
'Wilco Classic Library: Autobiography Of a Yogi (Hardcover)'

只需对titleSoup使用text方法

>>> titleSoup=firstBlockSoup.find('a',attrs={'class':'fk-srch-title-text fksd-bodytext'})
>>> titleSoup.text
u'Wilco Classic Library: Autobiography Of a Yogi (Hardcover)'

这也将起作用:

^{pr2}$

相关问题 更多 >