作为练习作业,我尝试使用BeautifulSoup库从Amazon解析this search results page。你知道吗
这是我的密码。你知道吗
from urllib import urlopen
from bs4 import BeautifulSoup
SourceURL = "http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=android"
ResultsPage = urlopen(SourceURL )
Soup = BeautifulSoup(ResultsPage)
print "<SearchResults>"
for SearchResult in Soup.findAll('li', attrs={'class': 's-result-item celwidget'}):
#Read Result Title
Title = SearchResult.find("h2", {"class": "a-size-medium a-color-null s-inline s-access-title a-text-normal"})
ResultTag = "\t<Result><![CDATA["
if Title is not None:
ResultTag += Title.text
ResultTag += "]]></Result>"
print ResultTag
print "</SearchResults>"
输出显示如下
<SearchResults>
<Result><![CDATA[Micromax Bolt S301 (Black, No charger, No earphone inbox)]]></Result>
<Result><![CDATA[Android Application Development (with Kitkat Support), Black Book]]></Result>
<Result><![CDATA[ZTE Blade Buzz White V815W]]></Result>
<Result><![CDATA[Android: App Development & Programming Guide: Learn In A Day! (Android, Rails, Ruby Programming, App Development...]]></Result>
<Result><![CDATA[]]></Result>
<Result><![CDATA[Karbonn Titanium S21 (Grey)]]></Result>
<Result><![CDATA[Head First Android Development]]></Result>
<Result><![CDATA[Micromax Canvas A1 Android One (White, 8GB)]]></Result>
<Result><![CDATA[Professional Android 4 Application Development (Wrox)]]></Result>
<Result><![CDATA[OnePlus X (Onyx) - Invite Only]]></Result>
<Result><![CDATA[Lenovo Vibe S1 (4G, White)]]></Result>
<Result><![CDATA[Micromax Bolt D320 (Black, 4GB)]]></Result>
<Result><![CDATA[2 in 1 Capacitive Stylus Pen With Black Ball Pen for Android Touch Sceen Mobile Phones and Tablets All iPads and...]]></Result>
<Result><![CDATA[Moto E 2nd Generation XT1506 (3G, Black)]]></Result>
<Result><![CDATA[Android: App Development & Programming Guide: Learn In A Day!]]></Result>
<Result><![CDATA[Lenovo Vibe S1 (4G, Dark Blue)]]></Result>
</SearchResults>
如果您注意到,由于某种原因,输出中缺少第五个结果,而它使用相同的代码打印所有其他行。基本上,搜索结果.查找()方法仅为一条记录返回空值。你知道吗
如果我遗漏了什么,你能告诉我吗?你知道吗
谢谢你, 尼基尔
如果查看链接http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=android,则第5个
li
元素与类名s-result-item celwidget
的条件匹配,而类名实际上是Customers shopped for android in
,并且与第二个条件a-size-medium a-color-null s-inline s-access-title a-text-normal
不完全匹配,这导致Title
被设置为无。你知道吗您可以将您的条件更新到下面以打印所需的输出。你知道吗
相关问题 更多 >
编程相关推荐