Selenium下拉按钮的问题

2024-09-30 03:25:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我在选择下拉按钮,然后选择其他选项来更改网页时遇到一些问题。我在Python中使用Selenium来提取这些数据。 URL是 https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/?saison_id=2019

迄今为止的代码:

driver = webdriver.Chrome('C:/Users/bzholle/chromedriver.exe')
driver.get('https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/?saison_id=2019')

#click out of iframe pop-up window
driver.switch_to.frame(driver.find_element_by_css_selector('iframe[title="SP Consent Message"]'))
accept_button = driver.find_element_by_xpath("//button[@title='ACCEPT ALL']")
accept_button.click()

driver.find_element_by_id("choosen-country").click()

我不断得到: NoSuchElementException:消息:没有这样的元素:无法找到元素

在HTML代码中,只有单击下拉箭头,国家列表才会出现;然而,无论我如何无法点击按钮。有人有什么建议吗


Tags: httpscomidbywwwdriverbuttonelement
2条回答

你忘了提到你实际上想搜集什么信息,所以我建议的以下替代解决方案只能帮你这么多。如果您能详细说明,并让我知道您试图获取的信息,我可以定制我的解决方案

记录ones的网络流量(在浏览器中查看页面时)会发现向各种REST API端点发出了多个XHR(XmlHttpRequest)HTTP GET请求,其响应是JSON,并且包含您可能想要获取的所有信息

我的建议是简单地模拟对必要的RESTAPI端点的HTTP GET请求。无需硒:

def get_country_id(country_name):
    import requests

    url = "https://www.transfermarkt.com/quickselect/countries"

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    return next((country["id"] for country in response.json() if country["name"] == country_name), None)


def get_competitions(country_id):
    import requests

    url = "https://www.transfermarkt.com/quickselect/competitions/{}".format(country_id)

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    return response.json()

def main():

    country_name = "Iceland"

    country_id = get_country_id(country_name)
    assert country_id is not None

    print("Competitions in {}:".format(country_name))
    for competition in get_competitions(country_id):
        print(competition["name"])
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

Competitions in Iceland:
Pepsi Max deild
Lengjudeild
Mjólkurbikarinn
Lengjubikarinn
>>> 

编辑-不幸的是,您试图获取的表数据并非来自API。它直接烘焙到页面的HTML中。不过,您不需要为此使用硒-BeautifulSoup已经足够好了:

def get_entries():
    import requests
    from bs4 import BeautifulSoup as Soup
    from operator import attrgetter

    url = "https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/"

    params = {
        "saison_id": "2019"
    }

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, params=params, headers=headers)
    response.raise_for_status()

    soup = Soup(response.content, "html.parser")

    table = soup.find("table", {"class": "items"})
    assert table is not None

    # Get text from header cells whose class does not contain the substring "hide"
    fieldnames = list(map(attrgetter("text"), table.select("thead > tr > th:not([class*=\"hide\"])")))
    yield fieldnames

    for row in table.select("tbody > tr"):
        # Assuming the first column will always be an img
        columns = list(map(attrgetter("text"), row.select("td:not([class*=\"hide\"])")[1:]))
        yield dict(zip(fieldnames, columns))

def main():

    from csv import DictWriter

    entries = get_entries()
    fieldnames = next(entries)

    with open("output.csv", "w", newline="") as file:
        writer = DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        
        for entry in entries:
            writer.writerow(entry)
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

CSV输出:

club,Squad,Total MV,ø MV
Man City,34,€1.27bn,€37.46m
Liverpool,56,€1.09bn,€19.53m
Spurs,36,€1.04bn,€28.94m
Chelsea,36,€797.00m,€22.14m
Man Utd,43,€775.20m,€18.03m
Arsenal,38,€680.55m,€17.91m
Everton,35,€525.50m,€15.01m
Leicester,32,€384.75m,€12.02m
West Ham,38,€371.75m,€9.78m
Wolves,44,€315.40m,€7.17m
Newcastle,41,€312.58m,€7.62m
Bournemouth,39,€311.20m,€7.98m
Watford,43,€270.65m,€6.29m
Southampton,36,€259.80m,€7.22m
Crystal Palace,33,€248.65m,€7.53m
Brighton,45,€225.83m,€5.02m
Burnley,35,€205.58m,€5.87m
Aston Villa,38,€184.60m,€4.86m
Norwich,38,€110.85m,€2.92m
Sheff Utd,34,€110.80m,€3.26m

真正的解决方案可能包括通过BeautifulSoup将对REST API的请求和对表数据的抓取结合起来——您将遍历每个国家、每个国家的竞争对手以及每年的竞争对手。我发布的更新代码假设我们只对IDGB1(在英国)的竞争感兴趣,并且只对2019年感兴趣

编辑-您必须稍微调整我的解决方案。我只过滤并保留那些其类不包含子字符串“hide”的列,但事实证明其中一些列很重要(例如age列)

这里有两个问题:

  1. 按下accept按钮后,需要添加行driver.switch_to.default_content()以切换回iframe
  2. 您试图标识的元素位于shadow root的内部。我所知道的识别这样一个元素的唯一方法是一种黑客行为,它涉及到执行javascript来获取影子根,然后在影子根中找到元素。如果我使用此代码,则可以单击该元素:
driver = webdriver.Chrome('C:/Users/bzholle/chromedriver.exe')
driver.get('https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/?saison_id=2019')

#click out of iframe pop-up window
driver.switch_to.frame(driver.find_element_by_css_selector('iframe[title="SP Consent Message"]'))
accept_button = driver.find_element_by_xpath("//button[@title='ACCEPT ALL']")
accept_button.click()

driver.switch_to.default_content()

shadow_section = driver.execute_script('''return document.querySelector("tm-quick-select-bar").shadowRoot''')

shadow_section.find_element_by_id("choosen-country").click()

相关问题 更多 >

    热门问题