我对python和beautifulsoup图书馆都是新手。我正在制作一个脚本,从网页上刮取一些图像。但该网站将图像以json的形式存储在源代码中。 另外还有一个问题,他们将相关列表的图像也存储在页面中
但是我需要得到所有具有“full_screen”属性的图像,但是只需要第一组源代码,因为我不想要其他列表的图像,我只需要当前页面的列表图像
我的代码:
import os
import requests
from bs4 import BeautifulSoup, Tag
import json
def getResponse(url):
while True:
try:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
return soup
except:
print("retrying...")
url = "https://www.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-botanica-tower-7469382.html"
soup = getResponse(url)
script = soup.find_all("script")
val = json.loads(script[7].text)
print(val)
源文档示例:
{"homepage":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/338\/248\/MODE\/6cf3ec\/7481797-75cceo.jpg","cts":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/668\/452\/MODE\/782bc1\/7481797-75cceo.jpg","small":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/260\/185\/MODE\/686c22\/7481797-75cceo.jpg","medium":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/668\/452\/MODE\/782bc1\/7481797-75cceo.jpg","thumb":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/95\/95\/MODE\/2f9a70\/7481797-75cceo.jpg","new_big":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/856\/550\/MODE\/7cbb67\/7481797-75cceo.jpg","new_small":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/416\/272\/MODE\/724ffe\/7481797-75cceo.jpg","full_screen":"https:\/\/www.propertyfinder.ae\/property\/2c86eb83cbe5c9588b9347ef0c0f50b9\/1312\/894\/MODE\/57d3b7\/7481797-75cceo.jpg"}},{"type":"property_image","id":"118819718","attributes":{"id":"118819718","path":"7481797-a0120o.jpg","number":2,"version":"537f08c43e0437e41778534772d1659a","is_default":false},"links":{"homepage":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/338\/248\/MODE\/a56d8f\/7481797-a0120o.jpg","cts":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/668\/452\/MODE\/094349\/7481797-a0120o.jpg","small":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/260\/185\/MODE\/b5637b\/7481797-a0120o.jpg","medium":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/668\/452\/MODE\/094349\/7481797-a0120o.jpg","thumb":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/95\/95\/MODE\/8d79d7\/7481797-a0120o.jpg","new_big":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/856\/550\/MODE\/30ee0f\/7481797-a0120o.jpg","new_small":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/416\/272\/MODE\/ee84d8\/7481797-a0120o.jpg","full_screen":"https:\/\/www.propertyfinder.ae\/property\/537f08c43e0437e41778534772d1659a\/1312\/894\/MODE\/8afdf1\/7481797-a0120o.jpg"}},{"type":"property_image","id":"118819719","attributes":{"id":"118819719","path":"7481797-f337do.jpg","number":3,"version":"3523f4921a89e87ea7d4b752038e93ef","is_default":false},"links":
错误:
No JSON object could be decoded
请任何人帮我获取第一组id为“全屏”的图像
Pyfiddle链接:https://pyfiddle.io/fiddle/8e039908-e713-43be-9513-ef4bab9dfb9d/?i=true
最简单的事情是通过API,但也可以通过
<script>
标记来完成。并非所有属性都具有“全屏”属性:带有
<script>
标记:使用API:
输出:
输出:
正则表达式演示版:
或者如果您的目标是
data
键因此,请使用以下版本:
输出:
相关问题 更多 >
编程相关推荐