使用BeautifulSoup 4分析网页时缺少值

2024-10-02 12:25:27 发布

您现在位置:Python中文网/ 问答频道 /正文

作为练习作业,我尝试使用BeautifulSoup库从Amazon解析this search results page。你知道吗

这是我的密码。你知道吗

from urllib import urlopen
from bs4 import BeautifulSoup


SourceURL = "http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=android"
ResultsPage = urlopen(SourceURL )
Soup = BeautifulSoup(ResultsPage)


print "<SearchResults>"

for SearchResult in Soup.findAll('li', attrs={'class': 's-result-item celwidget'}):
    #Read Result Title
    Title = SearchResult.find("h2", {"class": "a-size-medium a-color-null s-inline s-access-title a-text-normal"})

    ResultTag = "\t<Result><![CDATA["
    if Title is not None:
        ResultTag += Title.text

    ResultTag += "]]></Result>"
    print ResultTag

print "</SearchResults>"

输出显示如下

<SearchResults>
    <Result><![CDATA[Micromax Bolt S301 (Black, No charger, No earphone inbox)]]></Result>
    <Result><![CDATA[Android Application Development (with Kitkat Support), Black Book]]></Result>
    <Result><![CDATA[ZTE Blade Buzz White V815W]]></Result>
    <Result><![CDATA[Android:  App Development & Programming Guide: Learn In A Day! (Android, Rails, Ruby Programming, App Development...]]></Result>
    <Result><![CDATA[]]></Result>
    <Result><![CDATA[Karbonn Titanium S21 (Grey)]]></Result>
    <Result><![CDATA[Head First Android Development]]></Result>
    <Result><![CDATA[Micromax Canvas A1 Android One (White, 8GB)]]></Result>
    <Result><![CDATA[Professional Android 4 Application Development (Wrox)]]></Result>
    <Result><![CDATA[OnePlus X (Onyx) - Invite Only]]></Result>
    <Result><![CDATA[Lenovo Vibe S1 (4G, White)]]></Result>
    <Result><![CDATA[Micromax Bolt D320 (Black, 4GB)]]></Result>
    <Result><![CDATA[2 in 1 Capacitive Stylus Pen With Black Ball Pen for Android Touch Sceen Mobile Phones and Tablets All iPads and...]]></Result>
    <Result><![CDATA[Moto E 2nd Generation XT1506 (3G, Black)]]></Result>
    <Result><![CDATA[Android: App Development & Programming Guide: Learn In A Day!]]></Result>
    <Result><![CDATA[Lenovo Vibe S1 (4G, Dark Blue)]]></Result>
</SearchResults>

如果您注意到,由于某种原因,输出中缺少第五个结果,而它使用相同的代码打印所有其他行。基本上,搜索结果.查找()方法仅为一条记录返回空值。你知道吗

如果我遗漏了什么,你能告诉我吗?你知道吗

谢谢你, 尼基尔


Tags: inapptitleresultprogrammingandroiddevelopmentblack
1条回答
网友
1楼 · 发布于 2024-10-02 12:25:27

如果查看链接http://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=android,则第5个li元素与类名s-result-item celwidget的条件匹配,而类名实际上是Customers shopped for android in,并且与第二个条件a-size-medium a-color-null s-inline s-access-title a-text-normal不完全匹配,这导致Title被设置为无。你知道吗

您可以将您的条件更新到下面以打印所需的输出。你知道吗

if Title is not None:
    ResultTag = "\t<Result><![CDATA["
    ResultTag += Title.text
    ResultTag += "]]></Result>"
    print ResultTag

相关问题 更多 >

    热门问题