使用Python从HTML表中只提取一列数据？

1条回答

网友

1楼 · 发布于 2024-06-14 09:18:09

我建议你阅读整个表格，然后我建议你阅读整个表格。也许你会在速度上失去一些东西，但在简单中你会得到更多。在

使用pandas的read_html函数很容易做到：

import urllib2
import pandas as pd

page1 = urllib2.urlopen(
    'http://www.basketball-reference.com/players/h/hardeja01/gamelog/2015/').read()

#Select the correct table by some attributes, in this case id=pgl_basic.
#The read_html function returns a list of tables.
#In this case we select the first (and only) table with this id
stat_table = pd.io.html.read_html(page1,attrs={'id':'pgl_basic'})[0]

#Just select the column we needed. 
point_column = stat_table['PTS']

print point_column

如果您还不熟悉熊猫，您可以从以下内容中了解更多： http://pandas-docs.github.io/pandas-docs-travis/10min.html

例如，您可能希望从表中删除标题行或将表拆分为多个表。在

编程相关推荐

java我在尝试进行构造函数链接时不断遇到“错误：找不到符号”
java Powermock构造函数模拟对实例化对象没有影响
Spring测试中未加载java配置属性
java如何强制关闭来自另一个线程的JDBC连接？
java log4j2无法写入文件
参数[frmStartupGame]的java非法修饰符；只允许决赛
java如何在同一页pdfbox上创建多个表？
if语句如何检查此if条件？JAVA
java JNI教程，无法加载库
java Hibernate没有这样的过滤器配置错误

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python从HTML表中只提取一列数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >