网页抓取“itemprop”输出

2024-10-01 05:07:18 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我写了下面的代码来获取城市的位置

import requests
from bs4 import BeautifulSoup

#Loads the webpage
r = requests.get("https://www.century21.com/for-sale-homes/Westport-CT-20647c", headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
#grabs the contect of this page
c=r.content

if "blocked" in r.text:
    print ("we've been blocked")



#makes the content more readable
soup=BeautifulSoup(c,"html.parser")

#Prints out the content 
#print(soup.prettify())

#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})

#Finds the city of the property of interest
x=all[1].find("div", {"class":"sr-card__city-state"})




for itemprop in x:
        print(x.find("span", itemprop="addressLocality").text)

x的输出如下所示

<div class="sr-card__city-state">
<span itemprop="addressLocality">Westport</span>,
            <span itemprop="addressRegion">CT</span>
<span itemprop="postalCode">06880</span>
</div>

当执行for循环时,我得到以下输出

Westport
Westport
Westport
Westport
Westport
Westport
Westport

当它打印正确的输出时,我不明白为什么它要打印7次。我知道我在犯错误,但我不知道我在哪里犯了错误。如果有人能指出正确的方向,我将不胜感激

谢谢


Tags: ofthedivforcontentallfindcard
1条回答
网友
1楼 · 发布于 2024-10-01 05:07:18

x的长度是7,这就是它显示7次输出的原因。你可能想试试这样的东西

#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})

#Finds the city of the property of interest
x=all[1].find("div", {"class":"sr-card__city-state"})

print(x)

print(len(x)) # Length of x

for prop in x.find("span", itemprop="addressLocality"):
        print(prop)

相关问题 更多 >