使用此代码:
url = "https://github.com/searcho=desc&p=1&q=stars%3A%3E1&s=stars&type=Repositoris"
with urllib.request.urlopen(url) as response:
html = response.read()
html = html.decode('utf-8')
with open('page_content.html', 'w', encoding='utf-8') as new_file:
new_file.write(html)
soup = BeautifulSoup(html,'lxml')
g_data= soup.findAll("a", {"class":"v-align-middle"})
print(g_data[0])
输出为:
<a class="v-align-middle" data-hydro-click='{"event_type":"search_result.click","payload":{"page_number":1,"query":"stars:>1","result_position":1,"click_id":28457823,"result":{"id":28457823,"global_relay_id":"MDEwOlJlcG9zaXRvcnkyODQ1NzgyMw==","model_name":"Repository","url":"https://github.com/freeCodeCamp/freeCodeCamp"},"originating_request_id":"ECC6:1DF24:CE9C0F:1667572:5A8DDD6F"}}' data-hydro-hmac="42c4e038b86cefc302d5637e870e6d746ee7fa95eadf2b26930cb893c6a3bc53" href="/freeCodeCamp/freeCodeCamp">freeCodeCamp/freeCodeCamp</a>
如何从输出中提取以下url: https://github.com/freeCodeCamp/freeCodeCamp
谢谢!你知道吗
它在一个json字符串中,所以很难找到它
获取属性的值
json.loads()
并将其作为常规python dict使用:相关问题 更多 >
编程相关推荐