通过id、class、xpath、css选择器查找元素,在使用selenium和beautifulsoup进行webscraping时返回none

2024-09-27 22:19:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新的网站垃圾,并使用beautifulsoup和硒。我正在尝试从以下网页中获取数据:

    https://epl.bibliocommons.com/item/show/2300646980

我正试图删去“包含该职位的员工名单”一节。特别是,我想获取<li>标记的数量,因为我只需要员工列表上的项目/链接的数量。你知道吗

我在“Inspect”提供的HTML代码上尝试了以下内容。以下是我试图从中提取的HTML代码块:

<div class="ugc_bandage">
  <div class="lists_heading clearfix">
    <h3 data-test-id="ugc-lists-heading">
      Listed
    </h3>
    <div class="ugc_add_link">
      <div class="dropdown saveToButton clearfix" id="save_to_2300646980_id_7a3ateh0panp1uv0he1v7aqmj9" data-test-id="add-to-list-dropdown-container">
  <a href="#" aria-expanded="false" aria-haspopup="true" class=" dropdown-toggle dropdown-toggle hide_trigger_icon" data-test-id="add-to-list-save-button" data-toggle="dropdown" id="save_button_2300646980_id_7a3ateh0panp1uv0he1v7aqmj9" rel="nofollow">
       <i aria-hidden="true" class=" icon-plus"></i>
<span aria-hidden="true">Add</span><span class="sr-only" data-js="sr-only-dropdown-toggle" data-text-collapsed="Add, collapsed" data-text-expanded="Add, expanded">Add, collapsed</span><span aria-hidden="true" class="icon-arrow"></span></a>  
  <ul class="dropdown-menu">
      <li>
        <a href="/user_lists/new?bib=2300646980&amp;origin=https%3A%2F%2Fepl.bibliocommons.com%2Fitem%2Fload_ugc_content%2F2300646980" class="newList">Create a New List</a>
      </li>
      <li>
        <a href="/lists/add_bib/mine?bib=2300646980_fangirl" data-js="cp-overlay" id="more_lists_id_7a3ateh0panp1uv0he1v7aqmj9">Existing Lists »</a>
      </li>

  </ul>
</div>

    </div>
  </div>
  <h4 data-test-id="staff-lists-that-include-this-title">Staff Lists that include this Title</h4>
  <div data-analytics="{ &quot;SubFeature&quot;: &quot;Lists that include this title&quot; }" class="expand clearfix" id="all_lists_expand" testid="text_listsincluding">
    <ul class="further_list">
      <li> [LIST ENTRIES START HERE, BUT THERE'S SO MANY, IT WOULD MAKE THIS POST TO LONG.] </li>

  1. 我使用xpath从检查人员列表部分(id="all_lists_expand")中复制了上述代码:
    element = driver.find_elements_by_xpath('//*[@id="rightBar"]/div[3]/div')
  1. 我尝试通过使用类名刮取来刮取节:
    element = driver.find_element_by_class_name('expand clearfix')
  1. 我还尝试使用css选择器进行刮除:
    element = driver.find_element_by_css_selector('#all_lists_expand')

我还完成了上述代码的其他变体,查找元素的父类、XPath等

以上所有尝试都返回NONE。我不确定我做错了什么,我应该触发一个事件或使用硒的东西吗?我甚至没有点击列表中列出的任何链接,甚至没有保存一个链接列表,我只需要计算有多少链接开始。你知道吗


Tags: 代码divid列表data链接lielement
3条回答

你不需要硒的花费。您可以对页面对该内容发出相同的GET请求,然后从返回的json中提取html,并使用bs4进行解析和提取链接

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://epl.bibliocommons.com/item/load_ugc_content/2300646980').json()
soup = bs(r['html'], 'lxml')
links = [i['href'] for i in soup.select('[data-test-id="staff-lists-that-include-this-title"] + div [href]')]
print(len(links))
print(links)

要获得Staff Lists that Include that Title诱导WebDriverWaitpresence_of_all_elements_located()下的所有锚定标记,这将提供100个链接。你知道吗

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("https://epl.bibliocommons.com/item/show/2300646980")
elements=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,'//h4[@data-test-id="staff-lists-that-include-this-title"]/following::div[1]//li/a')))
print(len(elements))
for ele in elements:
    print(ele.get_attribute('href'))

输出

https://epl.bibliocommons.com/list/share/114110843_schoolcorps1/1495892159_native_american,_rl_k-3,_spanish_middle_amp_high_school_multcolib_assignments
https://epl.bibliocommons.com/list/share/1467158627_stpl_crystal/1491354799_am_i_seeing_double
https://epl.bibliocommons.com/list/share/568630227_vpl_childrens_teens_info/1490175639_books_just_for_you_-_thought_provoking_amp_charming_ya_reads
https://epl.bibliocommons.com/list/share/1176606007_overdue_finds/1485773789_overdue_finds_episode_39_guilty_pleasures
https://epl.bibliocommons.com/list/share/1312082177_aloha_youthservices/1468001367_its_okay_to_not_be_okay_for_teens
https://epl.bibliocommons.com/list/share/631739687_eplpersonalpicks2/1484211504_epl_personal_picks_ya_novels
https://epl.bibliocommons.com/list/share/186066773_jclemmaf/837858917_favorite_and_my_best
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1476340687_teen_lit_chat_booklist_august_2019
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1459365327_astrology_teen_booklist_books_you_might_like_if_youre_a_virgo
https://epl.bibliocommons.com/list/share/1058529507_pplteen/1258199057_best_back_to_school_reads
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1478214359_ya_novels_about_school
https://epl.bibliocommons.com/list/share/106274081_wplstaffpicks/1477722487_wpl_summer_reads_2019
https://epl.bibliocommons.com/list/share/173100305_jclangelicar/1226682237_amazing_reads_for_teens_and_up
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/1117926097_tag_recommends_continued
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/744582537_tag_recommends_2018
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/1184991797_lets_talk_mental_health
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/822272858_ppl_teens_love,_loss,_and_all_the_feels
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/692256398_aampe_picks
https://epl.bibliocommons.com/list/share/73977058_jclbeckyc/1385964387_the_best_books_of_2019
https://epl.bibliocommons.com/list/share/1059338207_readingadviser_sally/1439607877_books_for_20_somethings-fvrl-2019
https://epl.bibliocommons.com/list/share/279600817_lpl_readersservices/1457670767_2019_squad_goals_read_a_book_set_on_a_college_or_university_campus
https://epl.bibliocommons.com/list/share/631739687_eplpersonalpicks2/1458857587_epl_personal_picks_just_a_little_bit_of_love
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1291469057_female_pov
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1128194327_susans_picks
https://epl.bibliocommons.com/list/share/69155564_kantoniw/376769097_teen_-_terrific_titles
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1292121977_realistic_fiction
https://epl.bibliocommons.com/list/share/1300215227_beaverton_iand/1303358407_books_where_the_parents_are_cool
https://epl.bibliocommons.com/list/share/215214545_multcolib_dianaa/1450141617_casting_a_wide_net_for_tammy_from_multcolib_my_librarian_diana
https://epl.bibliocommons.com/list/share/681590123_scl_kaylin/1030053197_kaylins_picks
https://epl.bibliocommons.com/list/share/173530091_jclhebaha/1171128547_hebahs_staff_picks
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1288931697_recommended_reads_11-12
https://epl.bibliocommons.com/list/share/275252227_martinregionalreads/1369306597_diversity_teenya_books
https://epl.bibliocommons.com/list/share/72152117_steacy_library/1204064657_classic_teen_reads
https://epl.bibliocommons.com/list/share/700233957_snoislelib_suggests/1436626997_harry_potter_y_la_piedra_filosofal
https://epl.bibliocommons.com/list/share/235700377_pomolibrary/1436872057_pomo_picks_-_teen_-_tsrc_2019_-_book_that_is_not_in_a_series_-_grades_9,_10,_11,_12
https://epl.bibliocommons.com/list/share/694280209_kimberlyreads/752020447_level_up_your_reading_-_books_for_gamers_(teen_edition)
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1220688167_ya_reads_for_reluctant_readers
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1405453637_teen_book_chat_april_2019
https://epl.bibliocommons.com/list/share/223261407_burien_teens_read/1424507527_srp_book_talk_glendale_lutheran_8th_grade
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1412382807_top_10_ya_coming-of-age_reads
https://epl.bibliocommons.com/list/share/80402800_vpl_booksjustforyou11/1413011449_vpl_-_books_just_for_you_-_biography,_humour,_inspiration,_short_stories,_and_animal_fiction
https://epl.bibliocommons.com/list/share/760546357_scteenprogramming/1411563307_cmlibrary_suggests_imagicon_2019
https://epl.bibliocommons.com/list/share/1078894377_lisadempster/1411364207_celebrate_your_inner_geek
https://epl.bibliocommons.com/list/share/682768697_arapahoekati/1055224107_published_nanowrimo_authors
https://epl.bibliocommons.com/list/share/1382187347_mollywally/1404738807_mental_health
https://epl.bibliocommons.com/list/share/568630227_vpl_childrens_teens_info/1395459037_books_just_for_you_-_ya_contemporary_amp_mystery
https://epl.bibliocommons.com/list/share/550038607_spl_brittany/1322718057_one_word_titles
https://epl.bibliocommons.com/list/share/1170754297_sppl_recommends/1383661857_no,_you_cant_read_these_books
https://epl.bibliocommons.com/list/share/639095537_sausalito_staff_erin/1377322417_ya_realistic_fiction_for_middle_schoolers
https://epl.bibliocommons.com/list/share/1060442917_readingadviser_jacque/1364177797_teen_favorites
https://epl.bibliocommons.com/list/share/69193241_pepl_knoeske/269126130_ya_reads
https://epl.bibliocommons.com/list/share/155181971_surreylibraries_teens/385766437_hilarity_ensues
https://epl.bibliocommons.com/list/share/1136103357_hfxpl_teens/1374745777_hey_what_are_you_reading
https://epl.bibliocommons.com/list/share/155181971_surreylibraries_teens/1349496509_valentines_day_2019_young_adult_fiction
https://epl.bibliocommons.com/list/share/138070021_surreylibraries_reads/1304148677_staff_picks_what_we_loved_in_2014
https://epl.bibliocommons.com/list/share/80402800_vpl_booksjustforyou11/1365444807_vpl_-_new_adult_-_top_picks
https://epl.bibliocommons.com/list/share/715647058_st8ceyw8/1365437547_recommendations_for_teen_girls
https://epl.bibliocommons.com/list/share/1131250757_lvccld_saharawest/1363494177_geeks_rule_books_for_teens
https://epl.bibliocommons.com/list/share/548538121_spl_merley/1358151383_help_for_anxious_teens
https://epl.bibliocommons.com/list/share/679797892_dbrl_idaf/1355664913_matryoshka_fiction
https://epl.bibliocommons.com/list/share/1315907392_indypl_kirstenw/1315916377_staff_recommendations_great_reads_for_teens
https://epl.bibliocommons.com/list/share/1303998627_tigard_teens/1351425041_put_a_heart_on_it
https://epl.bibliocommons.com/list/share/515946100_tacomalibrary/1343962909_a_book_about_books,_as_part_of_the_extreme_reader_challenge
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1342688089_ya_with_geek_themes
https://epl.bibliocommons.com/list/share/1282688857_indypl_katieb/1285699927_nanowrimo-_a_survival_guide
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1333071229_libfaves
https://epl.bibliocommons.com/list/share/550038607_spl_brittany/1329175977_fresh_starts,_new_beginnings_and_second_chances
https://epl.bibliocommons.com/list/share/710260400_annag_kcmo/1322113517_fandoms
https://epl.bibliocommons.com/list/share/558294898_jclemilyd/1326533547_monticello_youth_services_recommendsya_books
https://epl.bibliocommons.com/list/share/429022740_loganlib_meg/1324424287_2019_reading_challenge
https://epl.bibliocommons.com/list/share/95681271_samcmar/1318184807_mpl_2019_reading_challenge_-_a_one_word_title
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1322057871_if_you_like_dumplin
https://epl.bibliocommons.com/list/share/803717002_adult_custom_reading_list/1321396267_omaha_custom_list_page-turners_122018
https://epl.bibliocommons.com/list/share/134340301_vpl_booksjustforyou/1160285087_vpl_-_books_just_for_you_-_fun_reads
https://epl.bibliocommons.com/list/share/1303998627_tigard_teens/1320248908_do_you_ship_them
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1030069518_a_fandom_life_for_me
https://epl.bibliocommons.com/list/share/1066057257_mcpl_readerslounge/1314212917_woodneath_staff_picks_babysitters_club_reads
https://epl.bibliocommons.com/list/share/1081387957_pacl_teens/1313796687_tlab_recommends_romance_for_teens
https://epl.bibliocommons.com/list/share/768695927_dcpl_adults/1311059977_dcpl_staff_picks_for_2018
https://epl.bibliocommons.com/list/share/186066773_jclemmaf/1313674757_ya_books_about_teen_writers
https://epl.bibliocommons.com/list/share/888940897_cmlibrary_corvolunteens/1306009547_calians_favorites
https://epl.bibliocommons.com/list/share/344916587_chapel_hill_teenstaff/687974851_unusual_formats
https://epl.bibliocommons.com/list/share/1204935759_jclmegb/1303553797_teen_reads_to_tickle_your_funny_bone_amp_warm_your_heart
https://epl.bibliocommons.com/list/share/95796007_jessicagma/1302711427_book_smack_j%C3%B3lab%C3%B3kafl%C3%B3%C3%B0i%C3%B0_2018_jessica
https://epl.bibliocommons.com/list/share/219559045_kclsaarene/1302650609_best-selling_nanowrimo_winners
https://epl.bibliocommons.com/list/share/569520567_hholley/710149067_opl_staff_picks
https://epl.bibliocommons.com/list/share/491055517_cals_readers/1298323449_nanowrimo_books_that_got_published
https://epl.bibliocommons.com/list/share/73877511_jcltracim/1296589167_nanowrimo_-_published_wrimos
https://epl.bibliocommons.com/list/share/219559045_kclsaarene/1296304497_pizza_and_books_einstein_ms_november_2018
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1295497427_nanowrimo
https://epl.bibliocommons.com/list/share/675410617_orlreads/1295410127_orl_recommends_-_nanowrimo_reads
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1294054347_family_stories
https://epl.bibliocommons.com/list/share/1165043747_sppl_teens/1282475677_lets_talk_about_mental_health
https://epl.bibliocommons.com/list/share/685936385_arapahoebridget/723765118_breaking_out_of_nanowrimo_writers_block
https://epl.bibliocommons.com/list/share/1106377937_mckenzingtonc/1277464857_disability_awareness
https://epl.bibliocommons.com/list/share/105396413_youthcollection/1260776227_fall_2018_must-read_ya_novels
https://epl.bibliocommons.com/list/share/105396413_youthcollection/1261651207_ya_books_about_social_anxiety
https://epl.bibliocommons.com/list/share/1244999997_jcls_youth_services/1259372807_libraries_rock_talent_teen_five_star_books
https://epl.bibliocommons.com/list/share/79828372_vpl_informationservice/1254087617_vpl_-_new_adult_fiction
https://epl.bibliocommons.com/list/share/308506797_kclsreads/1253264637_to_all_the_boys_ive_loved_before

我浏览了您的页面并编写了一个XPath,它将在“包含此标题的人员列表”下找到所有li元素。更新为包含所有相关元素的wait。你知道吗

WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPath, "//div[h4[text()='Staff Lists that include this Title']]/div[2]/ul/li[@class='']")))
driver.find_elements_by_xpath("//div[h4[text()='Staff Lists that include this Title']]/div[2]/ul/li[not(contains(@class, 'extra'))]")

此XPath查询主div元素,该元素包含包含文本“包含此标题的人员列表”的h4元素下的所有li项。然后我们查询div[2],其中包含相关的li项。最后一个查询是对类名为空的li元素的查询。从页面源代码中可以看到,有许多隐藏的li元素具有class='extra'属性。我们不需要这些li元素,因此我们查询not(contains(@class=, 'extra'))以获得没有extra类名的li元素。你知道吗

如果上述XPath不起作用,我还修改了您在原始问题中发布的另一个XPath:

WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPath, "//*[@id="rightBar"]/div[3]/div/div[2]/ul/li[not(contains(@class, 'extra'))]")))
driver.find_elements_by_xpath("//*[@id="rightBar"]/div[3]/div/div[2]/ul/li[not(contains(@class, 'extra'))]")

对于您提供的URL,两个查询都检索到5个结果:

XPath query

相关问题 更多 >

    热门问题