从ul类获取所有元素并与文本文件(Python Selenium)进行比较

2024-06-26 13:33:47 发布

您现在位置:Python中文网/ 问答频道 /正文

正如标题所暗示的那样,我实际上正在尝试从下拉字段中获取所有li元素,并能够将这些结果与特定.txt文件中的结果进行比较

html/DOM包含此类列表

< div id = "ProgramCategoriesAndCodes_chzn" class = "chosen-container groups-are-selectable show-selected-in-list chosen-container-multi modified-chzn chosen-container-active" style = "width: 291px;" > <ul class="chosen-choices"> <li class="search-field"> <input type="text" value=" " class="default" autocomplete="off" style="width: 266px;"></li> </ul> <div class="chosen-drop" style="left: 0px; width: 291px; top: 29px;"> <ul class="chosen-results"> <li id="ProgramCategoriesAndCodes_chzn_g_0" class="group-result active-result child-matched" title="AA - AEROBICS">AA - AEROBICS</li> <li id="ProgramCategoriesAndCodes_chzn_o_1" class="active-result group-option" style="" title="AABD - BODYSHAPING WITH CORY EVERSON">AABD - BODYSHAPING WITH CORY EVERSON</li> <li id="ProgramCategoriesAndCodes_chzn_o_2" class="active-result group-option" style="" title="AABP - Cory Everson's Gotta Sweat">AABP - Cory Everson's Gotta Sweat</li> <li id="ProgramCategoriesAndCodes_chzn_o_3" class="active-result group-option" style="" title="AABS - Bodyshaping">AABS - Bodyshaping</li> <li id="ProgramCategoriesAndCodes_chzn_o_4" class="active-result group-option" style="" title="AACF - Crunch Fitness">AACF - Crunch Fitness</li> <li id="ProgramCategoriesAndCodes_chzn_o_5" class="active-result group-option" style="" title="AACJ - City Jam">AACJ - City Jam</li> <li id="ProgramCategoriesAndCodes_chzn_o_6" class="active-result group-option" style="" title="AADA - GETTING FIT WITH DENISE AUSTIN">AADA - GETTING FIT WITH DENISE AUSTIN</li> <li id="ProgramCategoriesAndCodes_chzn_o_7" class="active-result group-option" style="" title="AAFA - Fitness America Tour">AAFA - Fitness America Tour</li> <li id="ProgramCategoriesAndCodes_chzn_o_8" class="active-result group-option" style="" title="AAKI - KIANA'S FLEX APPEAL">AAKI - KIANA'S FLEX APPEAL</li> <li id="ProgramCategoriesAndCodes_chzn_g_9" class="group-result active-result child-matched" title="ABB - ESPN Radio Baseball">ABB - ESPN Radio Baseball</li> <li id="ProgramCategoriesAndCodes_chzn_o_10" class="active-result group-option" style="" title="ABBAS - MLB All Star Game">ABBAS - MLB All Star Game</li> <li id="ProgramCategoriesAndCodes_chzn_o_11" class="active-result group-option" style="" title="ABBDC - Series Del Caribe">ABBDC - Series Del Caribe</li> <li id="ProgramCategoriesAndCodes_chzn_o_12" class="active-result group-option" style="" title="ABBDS - MLB Division Series">ABBDS - MLB Division Series</li> <li id="ProgramCategoriesAndCodes_chzn_o_13" class="active-result group-option" style="" title="ABBLC - MLB League Championships">ABBLC - MLB League Championships</li> <li id="ProgramCategoriesAndCodes_chzn_o_14" class="active-result group-option" style="" title="ABBMA - MLB Meet the Stars">ABBMA - MLB Meet the Stars</li> <li id="ProgramCategoriesAndCodes_chzn_o_15" class="active-result group-option" style="" title="ABBML - MLB Regular">ABBML - MLB Regular</li> <li id="ProgramCategoriesAndCodes_chzn_o_16" class="active-result group-option" style="" title="ABBMS - MLB Specials">ABBMS - MLB Specials</li> <li id="ProgramCategoriesAndCodes_chzn_o_17" class="active-result group-option" style="" title="ABBSN - MLB Sunday Night Baseball">ABBSN - MLB Sunday Night Baseball</li> <li id="ProgramCategoriesAndCodes_chzn_o_18" class="active-result group-option" style="" title="ABBWB - World Baseball Classic">ABBWB - World Baseball Classic</li> <li id="ProgramCategoriesAndCodes_chzn_o_19" class="active-result group-option" style="" title="ABBWS - MLB World Series">ABBWS - MLB World Series</li> <li id="ProgramCategoriesAndCodes_chzn_g_20" class="group-result active-result child-matched" title="ABK - Deportes Radio Basketball">ABK - Deportes Radio Basketball</li> <li id="ProgramCategoriesAndCodes_chzn_o_21" class="active-result group-option" style="" title="ABKAS - NBA All Star Game">ABKAS - NBA All Star Game</li> </ul> </div> </div>

到目前为止,我唯一能做的事情(因为我在Selenium和Python方面是个不折不扣的新手)是:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get(url)

elem = driver.find_element_by_xpath('//div[@id='ProgramCategoriesAndCodes_chzn']//ul[@class='chosen-results']

list_elem = elem.find_elements_by_tag_name("li")
for li in list_elem:
     text = li.text
     print(text)

我知道印刷品实际上没有任何用途,但我在这里迷失了如何真正实现我最初的要求:(

实际文本文件包含如下信息:

AEROBICS
              AE - AEROBICS
              AE - BODYBUILDING
              AE - MISCELLANEOUS
AMERICAS CLUB SOCCER
              SOAM - ARGENTINE SOCCER
              SOAM - ARGENTINE SOCCER RPT
              SOAM - BRASILEIRAO SUB 20
              SOAM - BRASILEIRAO SUB 20 RPT
              SOAM - BRAZILIAN CHAMPIONSHIP
              SOAM - BRAZILIAN CHAMPIONSHIP RPT
              SOAM - COPA DO BRAZIL
              SOAM - COPA DO BRAZIL RPT
              SOAM - COPA MEXICO
              SOAM - COPA MEXICO RPT
              SOAM - COPA SAO PAULO JUNIOR
              SOAM - LIGA FPD DE COSTA RICA
              SOAM - LIGA MEXICANA
              SOAM - LIGA MEXICANA RPT
              SOAM - MAJOR LEAGUE SOCCER
              SOAM - MAJOR LEAGUE SOCCER RPT
              SOAM - USA SOCCER

任何帮助都将不胜感激!! 亲切的问候


Tags: idtitlestylegroupliresultclassactive
1条回答
网友
1楼 · 发布于 2024-06-26 13:33:47

我假设您在如何匹配txt文件方面有问题,因为您能够从li元素中提取所有文本,所以我建议您将txt文件列表如下

with open('test.txt','r') as f:
    txt_list = f.read().splitlines() # it will give list of each line
    print(list(x.strip() for x in txt_list)) # it will strip the white space from start and end

现在您有两个列表,一个是li表单,另一个是txt表单。现在您可以根据需要使用pythoninset

另一个可以使用的方法是,您只需将您的txt从li匹配到文本文件字符串 名称='AE-有氧运动'

name = 'AE - AEROBICS'
with open('test.txt','r') as f:
    txt_str = f.read() # return txt file as str
    print(txt_str.find(name)) $ return index if present otherwise -1 

相关问题 更多 >