如何让靓汤只抓取网页中一组“[：”“：]”之间的东西？

import bs4 as bs import urllib.request source = urllib.request.urlopen("https://login.microsoftonline.com/common/discovery/keys").read() soup = bs.BeautifulSoup(source,'lxml') # --------------------------------------------- # prior script that I was playing with trying to tackle this issue import requests import urllib.request import time from bs4 import BeautifulSoup # Set URL to scrape new certs from newcerts = "https://login.microsoftonline.com/common/discovery/keys" # Connect to the URL response = requests.get(newcerts) # Parse HTML and save to BeautifulSoup Object soup = BeautifulSoup(response.text, "html.parser") keys = soup.find("span", attrs = {"class": "objectBox objectBox-string"})

2条回答

网友

1楼 · 编辑于 2024-10-02 14:20:47

您从该url获得的数据已经被结构化为Json或python dict格式。我将通过请求获取数据，并使用ast将其从字符串转换为dict格式。你知道吗

让我举个例子：

import requests, ast

# get the response data
response = requests.get("https://login.microsoftonline.com/common/discovery/keys")

#convert from string to dict with ast
my_dict = ast.literal_eval(response.text)

#see here the output info in your dict
print(my_dict)
#check that it's a dict 
print(type(my_dict))

从这里开始，您可以使用python中dict的一些知识来访问每个值。你知道吗

网友

2楼 · 编辑于 2024-10-02 14:20:47

不知道这是不是你想要的。请尝试以下脚本：

import json
import requests

url = 'https://login.microsoftonline.com/common/discovery/keys'

res = requests.get(url)
jsonobject = json.loads(res.content)
for item in jsonobject['keys']:
    print(item['x5c'])

相关问题更多 >

编程相关推荐

热门问题

热门文章