Selenium如何根据特定的下拉条件刮取数据

2024-09-29 21:40:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的html代码示例

<form action="/action_page.php">
  <label for="continent">Choose Continent:</label>
  <select name="continent" id="continent">
    <option value="asia">Asia</option>
    <option value="africa">Africa</option>
    <option value="Europe">Europe</option>
    <option value="South America">South America</option>
  </select>
  <br><br>
  <label for="country">Choose a Country:</label>
  <select name="country" id="country">
    <option value="Afghanistan">Afghanistan</option>
    <option value="Albania">Albania</option>
    <option value="Algeria">Algeria</option>
    <option value="Bahamas">Bahamas</option>
  </select>
   <label for="city">Choose a Cities:</label>
  <select name="city" id="city">
    <option value="Kabul">Kabul </option>
    <option value="Kandahar">Kandahar   </option>
    <option value="Herat">Herat </option>
    <option value="Bajzë">Bajzë</option>
  </select>
  <input type="submit" value="Submit">
</form>

假设下拉列表中有7大洲、大约200个国家和1000多个城市。 下拉值(coutry和cities)根据提供的大陆进行更改假设第一个大陆是亚洲,下拉列表自动更改为与亚洲相关的国家让我们假设第一个国家是阿富汗,然后我想要阿富汗的所有城市,在第二个循环中,类似的亚洲必须在第一个下拉列表中,但阿塞拜疆将在该国下拉列表,我想获取阿塞拜疆的所有城市

关于这件事,我不知道该怎么做。我将如何循环我的循环

在我看来,我应该以这种方式获得Json输出

[{"Asia":[{"Afghanistan":["kabul","Kandahar","Herat"]},{"Azerbaijan":["Cities of Azerbaijan---- "]}]},{"Europe":[{"Albania":["cities of Albania","--"]}]},

第一页的website参考选择client type = Individual,第三页我们可以看到

province, district, municiplity. 

我想获取所有这些数据


Tags: nameidcity列表forvalueselectcountry
1条回答
网友
1楼 · 发布于 2024-09-29 21:40:27

下面是根据结果刮除所有下拉列表(3个)的代码,即选择第一个下拉列表,则其他2个下拉列表将有不同的结果

代码:

import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(ChromeDriverManager().install())

driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://tms17.nepsetms.com.np/client-registration")
wait = WebDriverWait(driver, 10)
first_option = wait.until(
    EC.visibility_of_element_located((By.CSS_SELECTOR, "select[formcontrolname='clientDealerType']")))
select = Select(first_option)
select.select_by_visible_text('Individual')


def scroll_till_end():
    driver.execute_script(
        "var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = "
        "scrollingElement.scrollHeight;")


scroll_till_end()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[class$='next-btn']"))).click()

scroll_till_end()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[class$='next-btn']"))).click()

######### to scrape all the options from all the drop down.
number_of_province = 7
a = 1
z = 1
final_json = []
for i in range(0, number_of_province):
    time.sleep(2)
    select_province = Select(
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select[formcontrolname='province']"))))
    select_province.select_by_value(f'{z}: Object')
    time.sleep(2)
    number_of_district = len(
        driver.find_elements(By.CSS_SELECTOR, "select[formcontrolname='district'] option[value*='Ob']"))
    list_of_district = driver.find_elements(By.CSS_SELECTOR, "select[formcontrolname='district'] option[value*='Ob']")
    k = 0
    for j in range(0, number_of_district):
        select_district = Select(
            wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select[formcontrolname='district']"))))
        select_district.select_by_value(f'{a}: Object')
        time.sleep(2)
        district = list_of_district[k].text
        k = k + 1
        a = a + 1
        time.sleep(2)
        municipalities = []

        for municipality in driver.find_elements(By.CSS_SELECTOR, "select[formcontrolname='municipality'] option"):
            if not municipality.text == 'Select Municipality':
                municipalities.append(municipality.text)
        province = "Province 1" if i == 0 else \
            "Province 2" if i == 1 else \
                "Province 3" if i == 2 else \
                    "Province 4" if i == 3 else \
                        "Province 5" if i == 4 else \
                            "Province 6" if i == 5 \
                                else "Province 7"

        district_json = {
            province: {district: municipalities}

        }
        final_json.append(district_json)
    z = z + 1

print(final_json)
with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(final_json, f, ensure_ascii=False, indent=4)

导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

相关问题 更多 >

    热门问题