获取URL列表

import urllib.request from bs4 import BeautifulSoup url_list = ['URL1', 'URL2','URL3] def soup(): for url in url_list: sauce = urllib.request.urlopen(url) for things in sauce: soup_maker = BeautifulSoup(things, 'html.parser') return soup_maker # Scraping def getPropNames(): for propName in soup.findAll('div', class_="property-cta"): for h1 in propName.findAll('h1'): print(h1.text) def getPrice(): for price in soup.findAll('p', class_="room-price"): print(price.text) def getRoom(): for theRoom in soup.findAll('div', class_="featured-item-inner"): for h5 in theRoom.findAll('h5'): print(h5.text) for soups in soup(): getPropNames() getPrice() getRoom()

2条回答

网友

1楼 · 编辑于 2024-10-01 07:24:58

想想这条代码的作用：

def soup():
    for url in url_list:
        sauce = urllib.request.urlopen(url)
        for things in sauce:
            soup_maker = BeautifulSoup(things, 'html.parser')
            return soup_maker

我给你举个例子：

^{pr2}$

url_list = ['one', 'two', 'three']的输出是：

one
('one', 'a')

你现在看到了吗？怎么回事？在

基本上，soup函数在第一个返回return-不要返回任何迭代器，任何列表；只有第一个BeautifulSoup-这是iterable的幸运（或不幸运）：）

所以改变代码：

def soup3():
    soups = []
    for url in url_list:
        print(url)
        for thing in ['a', 'b', 'c']:
            print(url, thing)
            maker = 2 * thing
            soups.append(maker)
    return soups

然后输出为：

one
('one', 'a')
('one', 'b')
('one', 'c')
two
('two', 'a')
('two', 'b')
('two', 'c')
three
('three', 'a')
('three', 'b')
('three', 'c')

但我相信这也行不通：）只是想知道sauce返回的是什么：sauce = urllib.request.urlopen(url)实际上你的代码在迭代什么：for things in sauce-意味着things是什么意思。在

快乐的编码。在

网友

2楼 · 编辑于 2024-10-01 07:24:58

每个get*函数都使用一个全局变量soup，该变量在任何地方都没有正确设置。即使是这样，也不是一个好办法。将soup改为函数参数，例如：

def getRoom(soup):
    for theRoom in soup.findAll('div', class_="featured-item-inner"):
        for h5 in theRoom.findAll('h5'):
            print(h5.text)

for soup in soups():
    getPropNames(soup)
    getPrice(soup)
    getRoom(soup)

第二，你应该从soup()而不是{}来做{}，把它变成一个生成器。否则，您需要返回BeautifulSoup对象的列表。在

^{pr2}$

我还建议使用XPath或CSS选择器来提取HTML元素：https://stackoverflow.com/a/11466033/2997179。在

相关问题更多 >

编程相关推荐

热门问题

热门文章