Selenium Python无法提取所有span标记中的文本 - 问答 - Python中文网

Selenium Python无法提取所有span标记中的文本

2024-09-29 23:15:50 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在创建一个小型python程序，它可以自动运行10fastfingers。为了做到这一点，我必须首先提取我必须键入的所有单词。所有这些单词都存储在span标记中，如下所示：

当我运行代码时，它只提取前20-30个单词，而不是提取所有单词。为什么会这样？这是我的密码：

from selenium import webdriver
import time

url = "https://10fastfingers.com/typing-test/english"

browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")

browser.get(url)

time.sleep(10)

count = 1

wordlst = []

while True:
    
    try:
        word = browser.find_element_by_xpath(f'//*[@id="row1"]/span[{count}]')
        wordlst.append(word.text)
        count += 1
        
    except:
        break

print(wordlst)

输出：

['them', 'how', 'said', 'light', 'show', 'seem', 'not', 'two', 'under', 'hear', 'them', 'there', 'about', 'face', 'us', 'change', 'year', 'only', 'leave', 'number', 'found', 'father', 'people', 'house', 'really', 'my', 'spell', 'when', 'look', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

如何解决这个问题？任何帮助都将不胜感激。谢谢

Tags：代码标记 import 程序 browser url 键入 time

1条回答

网友

1楼 · 发布于 2024-09-29 23:15:50

你可以用BeautifulSoup做到这一点

from selenium import webdriver
import time
from bs4 import BeautifulSoup

url = "https://10fastfingers.com/typing-test/english"

browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")
browser.get(url)
time.sleep(3)
html_soup = BeautifulSoup(browser.page_source, 'html.parser')
div = html_soup.find_all('div', id = 'row1')
wordlst=div[0].get_text().split()
browser.quit()
print(wordlst)

或

为了继续你的方法

from selenium import webdriver
import time

url = "https://10fastfingers.com/typing-test/english"
browser = webdriver.Chrome("D:\\Python_Files\\Programs\\chromedriver.exe")
browser.get(url)
time.sleep(6)
wordlst=browser.find_elements_by_xpath('//div[@id="row1"]/span')
wordlst=[x.get_attribute("innerHTML") for x in wordlst]
browser.quit()
print(wordlst)

相关问题更多 >

编程相关推荐

热门问题

热门文章