Selenium下拉按钮的问题问题的回答

Selenium下拉按钮的问题

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

你忘了提到你实际上想搜集什么信息，所以我建议的以下替代解决方案只能帮你这么多。如果您能详细说明，并让我知道您试图获取的信息，我可以定制我的解决方案 记录ones的网络流量（在浏览器中查看页面时）会发现向各种REST API端点发出了多个XHR（XmlHttpRequest）HTTP GET请求，其响应是JSON，并且包含您可能想要获取的所有信息 我的建议是简单地模拟对必要的RESTAPI端点的HTTP GET请求。无需硒： <pre><code>def get_country_id(country_name): import requests url = "https://www.transfermarkt.com/quickselect/countries" headers = { "user-agent": "Mozilla/5.0" } response = requests.get(url, headers=headers) response.raise_for_status() return next((country["id"] for country in response.json() if country["name"] == country_name), None) def get_competitions(country_id): import requests url = "https://www.transfermarkt.com/quickselect/competitions/{}".format(country_id) headers = { "user-agent": "Mozilla/5.0" } response = requests.get(url, headers=headers) response.raise_for_status() return response.json() def main(): country_name = "Iceland" country_id = get_country_id(country_name) assert country_id is not None print("Competitions in {}:".format(country_name)) for competition in get_competitions(country_id): print(competition["name"]) return 0 if __name__ == "__main__": import sys sys.exit(main()) </code></pre> 输出： <pre><code>Competitions in Iceland: Pepsi Max deild Lengjudeild Mjólkurbikarinn Lengjubikarinn >>> </code></pre> <hr/> 编辑-不幸的是，您试图获取的表数据并非来自API。它直接烘焙到页面的HTML中。不过，您不需要为此使用硒-BeautifulSoup已经足够好了： <pre><code>def get_entries(): import requests from bs4 import BeautifulSoup as Soup from operator import attrgetter url = "https://www.transfermarkt.com/premierleague/startseite/wettbewerb/GB1/plus/" params = { "saison_id": "2019" } headers = { "user-agent": "Mozilla/5.0" } response = requests.get(url, params=params, headers=headers) response.raise_for_status() soup = Soup(response.content, "html.parser") table = soup.find("table", {"class": "items"}) assert table is not None # Get text from header cells whose class does not contain the substring "hide" fieldnames = list(map(attrgetter("text"), table.select("thead > tr > th:not([class*=\"hide\"])"))) yield fieldnames for row in table.select("tbody > tr"): # Assuming the first column will always be an img columns = list(map(attrgetter("text"), row.select("td:not([class*=\"hide\"])")[1:])) yield dict(zip(fieldnames, columns)) def main(): from csv import DictWriter entries = get_entries() fieldnames = next(entries) with open("output.csv", "w", newline="") as file: writer = DictWriter(file, fieldnames=fieldnames) writer.writeheader() for entry in entries: writer.writerow(entry) return 0 if __name__ == "__main__": import sys sys.exit(main()) </code></pre> CSV输出： <pre><code>club,Squad,Total MV,ø MV Man City,34,€1.27bn,€37.46m Liverpool,56,€1.09bn,€19.53m Spurs,36,€1.04bn,€28.94m Chelsea,36,€797.00m,€22.14m Man Utd,43,€775.20m,€18.03m Arsenal,38,€680.55m,€17.91m Everton,35,€525.50m,€15.01m Leicester,32,€384.75m,€12.02m West Ham,38,€371.75m,€9.78m Wolves,44,€315.40m,€7.17m Newcastle,41,€312.58m,€7.62m Bournemouth,39,€311.20m,€7.98m Watford,43,€270.65m,€6.29m Southampton,36,€259.80m,€7.22m Crystal Palace,33,€248.65m,€7.53m Brighton,45,€225.83m,€5.02m Burnley,35,€205.58m,€5.87m Aston Villa,38,€184.60m,€4.86m Norwich,38,€110.85m,€2.92m Sheff Utd,34,€110.80m,€3.26m </code></pre> 真正的解决方案可能包括通过BeautifulSoup将对REST API的请求和对表数据的抓取结合起来——您将遍历每个国家、每个国家的竞争对手以及每年的竞争对手。我发布的更新代码假设我们只对ID<code>GB1</code>（在英国）的竞争感兴趣，并且只对2019年感兴趣 编辑-您必须稍微调整我的解决方案。我只过滤并保留那些其类不包含子字符串“hide”的列，但事实证明其中一些列很重要（例如<code>age</code>列）

Selenium下拉按钮的问题

1 个回答

相关Python问题