使用Selenium和不同文本部分之间的空白刮除类的文本

2024-10-03 02:40:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用Selenium从这个网站上刮取类“tore dots”的所有文本值:https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/

因此,我使用以下函数:

dots_graph = driver.find_element_by_class_name("tore-dots")
dots_graph.text

结果是一个串联字符串,如:“6121198912101179685765554353454333”

但是,数字代表不同的位置,最多两位数。 如何使用分隔符符号刮除文本。例如,所有不同的元素在列表中分开,而不是在字符串中串联


Tags: 字符串https文本网站wwwseleniumdegraph
2条回答

在获得dots_graph之后,您应该使用dots_graph.find_elements_...(单词elements中的字符s)来搜索dots_graph中的所有<text>作为分隔元素,然后您应该使用for-loop从每个<text>中获取.text

dots_graph = driver.find_element_by_class_name("tore-dots")

all_items = dots_graph.find_elements_by_tag_name("text")

for item in all_items:
    print(item.text)

dot_vals = [item.text for item in all_items]

或者您可以尝试在一个xpath中获取tore-dots<text>

# doesn't work with `g` and `text` - maybe because it is inside `<SVG>` 
#all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')

all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')

for item in all_items:
    print(item.text)

dot_vals = [item.text for item in all_items]

或者与CSS选择器相同

all_items = driver.find_elements_by_css_selector('.tore-dots text')

for item in all_items:
    print(item.text)

dot_vals = [item.text for item in all_items]

顺便说一句:.text并不意味着<text>就像它在beautifulsoup中的意思一样


编辑:

最小工作代码

from selenium import webdriver

#driver = webdriver.Firefox()
driver = webdriver.Chrome()

driver.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')

# close popup window with message
driver.find_element_by_xpath('//button[@aria-label="Einwilligen"]').click()

print(' - FIND  -')

dots_graph = driver.find_element_by_class_name("tore-dots")
all_items = dots_graph.find_elements_by_tag_name("text")

dot_vals = [item.text for item in all_items]
print(dot_vals)

print(' - XPATH (g, text)  -')

# doesn't work with `g` and `text` - maybe because it is inside `<SVG>` 
all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')  

dot_vals = [item.text for item in all_items]
print(dot_vals)

print(' - XPATH (*, name)  -')

all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[local-name()="text"]')

dot_vals = [item.text for item in all_items]
print(dot_vals)

print(' - XPATH (*, local-name)  -')

all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')

dot_vals = [item.text for item in all_items]
print(dot_vals)

print(' - CSS  -')

all_items = driver.find_elements_by_css_selector('.tore-dots text')

dot_vals = [item.text for item in all_items]
print(dot_vals)

您可以使用driver.execute_script获取文本值:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/Users/jamespetullo/Downloads/chromedriver')
d.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')
dot_vals = d.execute_script('return Array.from(document.querySelectorAll("g.tore-dots text")).map(x => x.innerHTML)')

输出:

['2', '1', '1', '1', '1', '2', '6', '4', '2', '3', '5', '5', '4', '3', '3', '3', '2', '2', '2', '3', '2', '2', '2', '2', '1', '1', '2', '1', '1', '1', '1', '1', '1', '1']

相关问题 更多 >