靓汤清洗与误区

from bs4 import BeautifulSoup import urllib2 from lxml import html from lxml.etree import tostring trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index? station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&ch k_tafs=on&submit=Submit').read() soup = BeautifulSoup(open(trees)) print soup.get_text() item=soup.findAll(id="info") print item

1条回答

网友

1楼 · 发布于 2024-10-08 18:24:17

第一个问题在这一部分：

trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&chk_tafs=on&submit=Submit').read()
soup = BeautifulSoup(open(trees))

trees是一个类似文件的对象，无需对其调用open()，修复它：

soup = BeautifulSoup(trees, "html.parser")

我们还显式地将html.parser设置为底层解析器。你知道吗

然后，您需要对要从页面中提取的内容进行具体说明。下面是获取METAR text值的示例代码：

from bs4 import BeautifulSoup
import urllib2


trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&chk_tafs=on&submit=Submit').read()
soup = BeautifulSoup(trees, "html.parser")

item = soup.find("strong", text="METAR text:").find_next("strong").get_text(strip=True).replace("\n", "")
print item

打印KJFK 220151Z 20016KT 10SM BKN250 24/21 A3007 RMK AO2 SLP183 T02440206。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章