Python数据刮刀

#!/usr/bin/python #weather.scraper from bs4 import BeautifulSoup import urllib def main(): """weather scraper""" r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read() soup = BeautifulSoup(r, "html.parser") table = soup.find_all("table", class_="responsive airport-history-summary-table") tr = soup.find_all("tr") td = soup.find_all("td") print table if __name__ == "__main__": main()

1条回答

网友

1楼 · 发布于 2024-06-28 11:25:10

当您想要获取内容时，必须使用.getText()方法。由于find_all返回元素列表，因此必须选择其中一个（td[0]）

或者您可以执行以下操作，例如：

for tr in soup.find_all("tr"):
    print '>>>> NEW row <<<<'
    print '|'.join([x.getText() for x in tr.find_all('td')])

上面的循环为单元格旁边的每行单元格打印

请注意，您确实找到了所有td和所有tr的方法，但您可能只想在table中找到它们

如果要查找table中的元素，必须执行以下操作：

table.find('tr')而不是soup.find('tr)，因此BeautifulSoup将在table中寻找tr而不是整个html

您的代码已修改（根据您关于有更多表的评论）：

#!/usr/bin/python
#weather.scraper

from bs4 import BeautifulSoup
import urllib

def main():
    """weather scraper"""
    r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
    soup = BeautifulSoup(r, "html.parser")
    tables = soup.find_all("table")

    for table in tables:
        print '>>>>>>> NEW TABLE <<<<<<<<<'

        trs = table.find_all("tr")

        for tr in trs:
            # for each row of current table, write it using | between cells
            print '|'.join([x.get_text().replace('\n','') for x in tr.find_all('td')])



if __name__ == "__main__":
    main()

相关问题更多 >

编程相关推荐

热门问题

热门文章