我正在删除链接“https://www.kayak.it/flights/MIL-BCN/2021-09-01/2021-09-02/?sort=bestflight_a“在Python中使用Selenium
我想从字典中获取信息,输出如下:
[{'data_partenza': '2021-09-01', 'data_ritorno': '2021-09-02', 'price': '40', 'flights': '22:20 – 23:55\n'
'BGY Bergamo Orio al Serio\n'
'‐\n'
'BCN Barcellona-El Prat\n'
'diretto\n'
'1h 35m\n'
'6:20 – 8:00\n'
'BCN Barcellona-El Prat\n'
'‐\n'
'BGY Bergamo Orio al Serio\n'
'diretto\n'
'1h 40m',
{'data_partenza': '2021-09-01', 'data_ritorno': '2021-09-02', price: '34', 'flights': '16:35 – 18:05\n'
'LIN Aeroporto Milano Linate\n'
'‐\n'
'BCN Barcellona-El Prat\n'
'diretto\n'
'1h 30m\n'
'6:20 – 8:00\n'
'BCN Barcellona-El Prat\n'
'‐\n'
'BGY Bergamo Orio al Serio\n'
'diretto\n'
'1h 40m'},
....]
但是我得到了这个输出:
[{'data_partenza': '2021-09-01',
'data_ritorno': '2021-09-02',
'flights': '22:20 – 23:55\n'
'BGY Bergamo Orio al Serio\n'
'‐\n'
'BCN Barcellona-El Prat\n'
'diretto\n'
'1h 35m\n'
'6:20 – 8:00\n'
'BCN Barcellona-El Prat\n'
'‐\n'
'BGY Bergamo Orio al Serio\n'
'diretto\n'
'1h 40m',
'price': None}]
[{'data_partenza': '2021-09-01',
'data_ritorno': '2021-09-02',
'flights': '22:20 – 23:55\n'
'BGY Bergamo Orio al Serio\n'
'‐\n'
'BCN Barcellona-El Prat\n'
'diretto\n'
'1h 35m\n'
'6:20 – 8:00\n'
'BCN Barcellona-El Prat\n'
'‐\n'
'BGY Bergamo Orio al Serio\n'
'diretto\n'
'1h 40m',
'price': None}, ...
所以我有两个问题:我有很多不同的字典,我只想要一本字典;此外,当我试图削价飞行时,我总是得到“无”
这是我的代码:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pprint
wd = webdriver.Chrome('chromedriver',chrome_options = chrome_options)
wd.maximize_window()
wd.implicitly_wait(50)
#driver.get("https://account.battle.net/creation/flow/creation-full")
wait = WebDriverWait(wd, 20)
link = 'https://www.kayak.it/flights/MIL-BCN/2021-09-01/2021-09-02/?sort=bestflight_a'
wd.get(link)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[title='Accetta']"))).click()
except:
pass
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='onetrust-accept-btn-handler']"))).click()
except:
pass
detail_flights = []
j = 0
lngth = len(wd.find_elements_by_css_selector(".mainInfo"))
for i in range(lngth):
flights = ""
data_partenza = ""
data_ritorno = ""
price = ""
try:
if len(wd.find_elements_by_css_selector(".mainInfo")) > 0:
elements = wd.find_elements_by_css_selector(".mainInfo")
wd.execute_script("arguments[0].scrollIntoView(true);", elements[j])
#print(elements[j].get_attribute('innerText'))
j = j + 1
#detail_flights.append(elements[j].get_attribute('innerText'))
date = link.replace('https://www.kayak.it/flights/', '')
data_partenza = date[8:18]
data_ritorno = date[19:29]
detail_flights.append({'flights': elements[j].get_attribute('innerText'),
'price': elements[j].get_attribute('SrON-mb'),
'data_partenza': data_partenza,
'data_ritorno': data_ritorno})
pprint.pprint(detail_flights[0:])
else:
print('Nothing more to scrape')
except:
pass
检查一下这个代码。能够提取你想要的细节
输出:
相关问题 更多 >
编程相关推荐