Selenium Python无法提取所有span标记中的文本

2024-09-29 23:15:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在创建一个小型python程序,它可以自动运行10fastfingers。为了做到这一点,我必须首先提取我必须键入的所有单词。所有这些单词都存储在span标记中,如下所示:

enter image description here

当我运行代码时,它只提取前20-30个单词,而不是提取所有单词。为什么会这样?这是我的密码:

from selenium import webdriver
import time

url = "https://10fastfingers.com/typing-test/english"

browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")

browser.get(url)

time.sleep(10)

count = 1

wordlst = []

while True:
    
    try:
        word = browser.find_element_by_xpath(f'//*[@id="row1"]/span[{count}]')
        wordlst.append(word.text)
        count += 1
        
    except:
        break

print(wordlst)

输出:

['them', 'how', 'said', 'light', 'show', 'seem', 'not', 'two', 'under', 'hear', 'them', 'there', 'about', 'face', 'us', 'change', 'year', 'only', 'leave', 'number', 'found', 'father', 'people', 'house', 'really', 'my', 'spell', 'when', 'look', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

如何解决这个问题?任何帮助都将不胜感激。谢谢


Tags: 代码标记import程序browserurl键入time
1条回答
网友
1楼 · 发布于 2024-09-29 23:15:50

你可以用BeautifulSoup做到这一点

from selenium import webdriver
import time
from bs4 import BeautifulSoup

url = "https://10fastfingers.com/typing-test/english"

browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")
browser.get(url)
time.sleep(3)
html_soup = BeautifulSoup(browser.page_source, 'html.parser')
div = html_soup.find_all('div', id = 'row1')
wordlst=div[0].get_text().split()
browser.quit()
print(wordlst)

为了继续你的方法

from selenium import webdriver
import time

url = "https://10fastfingers.com/typing-test/english"
browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")
browser.get(url)
time.sleep(6)
wordlst=browser.find_elements_by_xpath('//div[@id="row1"]/span')
wordlst=[x.get_attribute("innerHTML") for x in wordlst]
browser.quit()
print(wordlst)

相关问题 更多 >

    热门问题