在网站中只获取一列

from bs4 import BeautifulSoup import urllib.request url = "https://datarecovery.com/rd/default-passwords/" soup = BeautifulSoup(urllib.request.urlopen(url)) for tag in soup.find_all("span", "paraEight"): tag = str(tag) print (tag)

3条回答

网友

1楼 · 编辑于 2024-09-30 10:40:09

我不确定beautifulsoup是否支持CSS selector的全部功能，但您能否尝试使用selector查找元素：

.table-responsive table tr td:nth-of-type(5) span

我在你链接的页面上尝试了这个方法，它给出了用户列的所有包含用户数据的跨度（例如“root”、“tech”等）

网友

2楼 · 编辑于 2024-09-30 10:40:09

也许你有

从表中查找所有行
然后找到标签，在你的情况下，用户来到位置5，所以检查，仅此而已，下面是代码示例

    from bs4 import BeautifulSoup
    import urllib.request

    url = "https://datarecovery.com/rd/default-passwords/"

    soup = BeautifulSoup(urllib.request.urlopen(url), 'html.parser')
    table = soup.find('table')
    for tr in table.find_all('tr'):
        all_text = []
        ct = 0
        for td in tr:
            ct += 1
            text = td.get_text(strip=True)
            if ct == 5:
                print(text)
    #output as: User
    #            root
    #            tech
    #            SNMPWrite
    #            (none)
    #            (none)
    #            DOCSIS_APP
    #            admin

网友

3楼 · 编辑于 2024-09-30 10:40:09

尝试使用lxml模块和xpath。我想差不多了

import urllib.request
from lxml import etree

url = "https://datarecovery.com/rd/default-passwords/"

htmlparser = etree.HTMLParser()

response = urllib.request.urlopen(url)
tree = etree.parse(response, htmlparser)

user_list = []

xpathparent = '/html/body/div[1]/div[2]/div[2]/div/div/div/div[1]/div/div/table/tbody/tr'
xpathselector = '/html/body/div[1]/div[2]/div[2]/div/div/div/div[1]/div/div/table/tbody/tr[2]/td[5]/span'
table_user = tree.xpath(xpathparent)
for item in table_user:
    x_path = (tree.getpath(item))
    user = tree.xpath(x_path + '/td[5]/span')
    if len(user) > 0:
        user_name = user[0].text
    else:
        user_name = ''
    user_list.append(user_name)

相关问题更多 >

编程相关推荐

热门问题

热门文章