如何从网页中保存值（int）并使用SeleniumPython将其存储到数组中

2条回答

网友

1楼 · 编辑于 2024-06-28 20:03:45

确保在Selenium中使用find_elements（使用s检索所有值）。根据您的样本，您应该使用：

ar=[int(val.text) for val in driver.find_elements_by_xpath('//tr/td[3]')]

然后计算平均值（确保import statistics）：

print(statistics.mean(ar))

一段代码（带LXML）：

data = """your_html_data"""

import statistics 
import lxml.html
tree = html.fromstring(data)

# create arrays (two ways of doing it, "ar1" is the one you should use if you work with Selenium)
ar1=[int(val.text) for val in tree.xpath("//tr/td[3]")]
ar2=[int(val) for val in tree.xpath("//tr/td[3]/text()")]

# display the arrays
print(ar1)
print(ar2)

# display the means
print(statistics.mean(ar1))
print(statistics.mean(ar2))
print(tree.xpath("sum(//tr/td[3]) div count(//tr/td[3])"))

最后一行是另一个选项，即：使用XPath直接计算平均值

输出：

[12, 13, 14, 15, 16, 17]
[12, 13, 14, 15, 16, 17]
14.5
14.5
14.5

如果需要更健壮的XPath，可以使用：

//tr/td[count(//th[.="Value"]/preceding-sibling::*)+1]

td元素的计算位置索引相对于“Value”头的位置

网友

2楼 · 编辑于 2024-06-28 20:03:45

这取决于从页面上删除的数据的确切外观。事实上，在清理过程中最大的困难是数据卫生，这也是您在这里所做的

获取元素的方法是正确的，只需进入chrome开发者工具，检查要刮取的元素，并复制元素的xpath

如果您正在刮取的元素只是一个包含多个值的字符串（我认为这不太可能，您试图获取的值很可能在单独的元素中，但是您可以刮取一个包含所有值的div，仍然使用.text()，您将得到一个包含值的字符串），那么您可以使用.split()在结果字符串上，它将被空格分割

然后，结合列表理解

my_int_array = [int(val) for val in scraped_string.split()]

你有一个整数数组

对于您在问题中发布的确切案例，我将刮取整个表，然后使用BeautifulSoup4提取值。使用selenium获取表的.innerHTML()，然后使用beautifulsoup解析该html数据BeautifulSoup Documentation

相关问题更多 >

编程相关推荐

热门问题

热门文章