从beauthulsoup中的CSS选择器获取“hrefs”的最佳方法？

2024-06-24 12:26:11 发布

男 | 程序猿一只，喜欢编程写python代码。

编写一个脚本，该脚本最初将为给定的人口普查块组中的所有人口普查块收集数据。不过，为了做到这一点，我首先需要能够获得一个给定区域内所有区块组的链接。tracts由一个包含其url的列表定义，该列表返回一个页面，其中列出css选择器“div#rList3 a”中的块组。运行此代码时：

from bs4 import BeautifulSoup
from urllib.request import urlopen

tracts = ['http://www.usa.com/NY023970800.html','http://www.usa.com/NY023970900.html',
       'http://www.usa.com/NY023970600.html','http://www.usa.com/NY023970700.html',
       'http://www.usa.com/NY023970500.html']

class Scrape:
    def scrapeTracts(self):
        for i in tracts:
            html = urlopen(i)
            soup = BeautifulSoup(html.read(), 'lxml')
            bgs = soup.select("div#rList3 a")
            print(bgs)

s = Scrape()
s.scrapeTracts()

这给了我一个如下的输出：[<a href="/NY0239708001.html">NY0239708001</a>]（为了这篇文章的长度，实际的链接被切掉了。）我的问题是，我如何才能得到'href'后面的字符串，在这个例子中是/NY0239708001.html？在

Tags： from import div 脚本 com http 列表链接

2条回答

网友

1楼 · 编辑于 2024-06-24 12:26:11

您可以在一行中完成此操作，方法是：

bgs = [i.attrs.get('href') for i in soup.select("div#rList3 a")]

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-06-24 12:26:11

每个节点都有一个attrs字典，其中包含该节点的属性…包括CSS类，或者在本例中是href。在

hrefs = []
for bg in bgs:
    hrefs.append(bg.attrs['href'])

从beauthulsoup中的CSS选择器获取“hrefs”的最佳方法？

相关问题更多 >

编程相关推荐

热门问题

热门文章

从beauthulsoup中的CSS选择器获取“hrefs”的最佳方法？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >