如何阅读网站内容？

import urllib from bs4 import BeautifulSoup import re import json html_aqi = urllib.urlopen("http://aqicn.org/city/shenyang/usconsulate/json").read().decode('utf-8') soup= BeautifulSoup(html_aqi) l = soup.p.get_text() aqi= json.loads(l)

city_name = url_format1.split("/")[5] site_name = url_format1.split("/")[6] url_format2 = "http://aqicn.org/aqicn/json/android/"+ city_name + "/"+ site_name ### --- Reason Why it's hard in practice 1559 sites need to be care with, and these sites differ by their location. Some are in city, some are in county. Their url are not the same pattern. for example: Type1 --> http://aqicn.org/city/hebi/json Type2 --> http://aqicn.org/city/jiangsu/huaian/json Type3 --> http://aqicn.org/city/china/xinzhou/jiyin/json

2条回答

网友

1楼 · 编辑于 2024-09-30 16:20:05

如果您对空气质量指数感兴趣，请找到div和aqivalue等级：

>>> import urllib
>>> from bs4 import BeautifulSoup
>>> 
>>> url = "http://aqicn.org/city/shenyang/usconsulate/json"
>>> soup = BeautifulSoup(urllib.urlopen(url), "html.parser")
>>> soup.find("div", class_="aqivalue").get_text()
u'171'

网友

2楼 · 编辑于 2024-09-30 16:20:05

第一个url http://aqicn.org/city/shenyang/usconsulate/json实际上并不返回JSON数据。它返回HTML数据。如果你真的对这些内容感兴趣，你必须解析HTML数据。在

您可以使用Beautifulsoup's HTML parser来完成此操作，尽管lxml.html包稍微简单一些。在

1。背景

2。我的麻烦

3。我的目标。

更新

相关问题更多 >

编程相关推荐

热门问题

热门文章