在网站中只获取一列

2024-09-30 10:40:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我怎样才能只得到这个网站的“用户”栏? https://datarecovery.com/rd/default-passwords/

我试着做:

from bs4 import BeautifulSoup
import urllib.request

url = "https://datarecovery.com/rd/default-passwords/"

soup = BeautifulSoup(urllib.request.urlopen(url))
for tag in soup.find_all("span", "paraEight"):
    tag = str(tag)
    print (tag)

但是我意识到每列都有“paraEight”类值,所以我得到了每列的所有值

更新:

soup = BeautifulSoup(urllib.request.urlopen(url))
for tag in soup.select(".table-responsive table tr td:nth-of-type(5) span"):
    tag = str(tag)
    print (tag)

Tags: httpsimportcomdefaulturlforrequesttag
3条回答

我不确定beautifulsoup是否支持CSS selector的全部功能,但您能否尝试使用selector查找元素:

.table-responsive table tr td:nth-of-type(5) span

我在你链接的页面上尝试了这个方法,它给出了用户列的所有包含用户数据的跨度(例如“root”、“tech”等)

也许你有

  1. 从表中查找所有行
  2. 然后找到标签,在你的情况下,用户来到位置5,所以检查,仅此而已, 下面是代码示例
    from bs4 import BeautifulSoup
    import urllib.request

    url = "https://datarecovery.com/rd/default-passwords/"

    soup = BeautifulSoup(urllib.request.urlopen(url), 'html.parser')
    table = soup.find('table')
    for tr in table.find_all('tr'):
        all_text = []
        ct = 0
        for td in tr:
            ct += 1
            text = td.get_text(strip=True)
            if ct == 5:
                print(text)
    #output as: User
    #            root
    #            tech
    #            SNMPWrite
    #            (none)
    #            (none)
    #            DOCSIS_APP
    #            admin

尝试使用lxml模块和xpath。我想差不多了

import urllib.request
from lxml import etree

url = "https://datarecovery.com/rd/default-passwords/"

htmlparser = etree.HTMLParser()

response = urllib.request.urlopen(url)
tree = etree.parse(response, htmlparser)

user_list = []

xpathparent = '/html/body/div[1]/div[2]/div[2]/div/div/div/div[1]/div/div/table/tbody/tr'
xpathselector = '/html/body/div[1]/div[2]/div[2]/div/div/div/div[1]/div/div/table/tbody/tr[2]/td[5]/span'
table_user = tree.xpath(xpathparent)
for item in table_user:
    x_path = (tree.getpath(item))
    user = tree.xpath(x_path + '/td[5]/span')
    if len(user) > 0:
        user_name = user[0].text
    else:
        user_name = ''
    user_list.append(user_name)

相关问题 更多 >

    热门问题