Selenium下拉按钮的问题

2条回答

网友

1楼 · 编辑于 2024-09-30 03:25:42

你忘了提到你实际上想搜集什么信息，所以我建议的以下替代解决方案只能帮你这么多。如果您能详细说明，并让我知道您试图获取的信息，我可以定制我的解决方案

记录ones的网络流量（在浏览器中查看页面时）会发现向各种REST API端点发出了多个XHR（XmlHttpRequest）HTTP GET请求，其响应是JSON，并且包含您可能想要获取的所有信息

我的建议是简单地模拟对必要的RESTAPI端点的HTTP GET请求。无需硒：

def get_country_id(country_name):
    import requests

    url = "https://www.transfermarkt.com/quickselect/countries"

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    return next((country["id"] for country in response.json() if country["name"] == country_name), None)


def get_competitions(country_id):
    import requests

    url = "https://www.transfermarkt.com/quickselect/competitions/{}".format(country_id)

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    return response.json()

def main():

    country_name = "Iceland"

    country_id = get_country_id(country_name)
    assert country_id is not None

    print("Competitions in {}:".format(country_name))
    for competition in get_competitions(country_id):
        print(competition["name"])
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出：

Competitions in Iceland:
Pepsi Max deild
Lengjudeild
Mjólkurbikarinn
Lengjubikarinn
>>>

编辑-不幸的是，您试图获取的表数据并非来自API。它直接烘焙到页面的HTML中。不过，您不需要为此使用硒-BeautifulSoup已经足够好了：

def get_entries():
    import requests
    from bs4 import BeautifulSoup as Soup
    from operator import attrgetter

    url = "https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/"

    params = {
        "saison_id": "2019"
    }

    headers = {
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, params=params, headers=headers)
    response.raise_for_status()

    soup = Soup(response.content, "html.parser")

    table = soup.find("table", {"class": "items"})
    assert table is not None

    # Get text from header cells whose class does not contain the substring "hide"
    fieldnames = list(map(attrgetter("text"), table.select("thead > tr > th:not([class*=\"hide\"])")))
    yield fieldnames

    for row in table.select("tbody > tr"):
        # Assuming the first column will always be an img
        columns = list(map(attrgetter("text"), row.select("td:not([class*=\"hide\"])")[1:]))
        yield dict(zip(fieldnames, columns))

def main():

    from csv import DictWriter

    entries = get_entries()
    fieldnames = next(entries)

    with open("output.csv", "w", newline="") as file:
        writer = DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        
        for entry in entries:
            writer.writerow(entry)
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

CSV输出：

club,Squad,Total MV,ø MV
Man City,34,€1.27bn,€37.46m
Liverpool,56,€1.09bn,€19.53m
Spurs,36,€1.04bn,€28.94m
Chelsea,36,€797.00m,€22.14m
Man Utd,43,€775.20m,€18.03m
Arsenal,38,€680.55m,€17.91m
Everton,35,€525.50m,€15.01m
Leicester,32,€384.75m,€12.02m
West Ham,38,€371.75m,€9.78m
Wolves,44,€315.40m,€7.17m
Newcastle,41,€312.58m,€7.62m
Bournemouth,39,€311.20m,€7.98m
Watford,43,€270.65m,€6.29m
Southampton,36,€259.80m,€7.22m
Crystal Palace,33,€248.65m,€7.53m
Brighton,45,€225.83m,€5.02m
Burnley,35,€205.58m,€5.87m
Aston Villa,38,€184.60m,€4.86m
Norwich,38,€110.85m,€2.92m
Sheff Utd,34,€110.80m,€3.26m

真正的解决方案可能包括通过BeautifulSoup将对REST API的请求和对表数据的抓取结合起来——您将遍历每个国家、每个国家的竞争对手以及每年的竞争对手。我发布的更新代码假设我们只对IDGB1（在英国）的竞争感兴趣，并且只对2019年感兴趣

编辑-您必须稍微调整我的解决方案。我只过滤并保留那些其类不包含子字符串“hide”的列，但事实证明其中一些列很重要（例如age列）

网友

2楼 · 编辑于 2024-09-30 03:25:42

这里有两个问题：

按下accept按钮后，需要添加行driver.switch_to.default_content()以切换回iframe
您试图标识的元素位于shadow root的内部。我所知道的识别这样一个元素的唯一方法是一种黑客行为，它涉及到执行javascript来获取影子根，然后在影子根中找到元素。如果我使用此代码，则可以单击该元素：

driver = webdriver.Chrome('C:/Users/bzholle/chromedriver.exe')
driver.get('https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/?saison_id=2019')

#click out of iframe pop-up window
driver.switch_to.frame(driver.find_element_by_css_selector('iframe[title="SP Consent Message"]'))
accept_button = driver.find_element_by_xpath("//button[@title='ACCEPT ALL']")
accept_button.click()

driver.switch_to.default_content()

shadow_section = driver.execute_script('''return document.querySelector("tm-quick-select-bar").shadowRoot''')

shadow_section.find_element_by_id("choosen-country").click()

相关问题更多 >

编程相关推荐

热门问题

热门文章