使用Beautiful Soup提取特定列

2 投票

3 回答

3420 浏览

提问于 2025-04-18 01:22

我正在慢慢学习Python和BeautifulSoup，但遇到了一些困难。

我想从以下的布局中提取第一列和第四列的数据（这里缩小了显示）。http://pastebin.com/bTruubrn

这个文件保存在本地，目前我有一些从其他类似问题中拼凑来的代码，但我无法让它们正常工作。

for row in soup.find('table')[0]body.findall('tr'):
first_column = row.findAll('td')[0].contents
third_column = row.findAll('td')[3].contents
print (first_column, third_column)

3 个回答

你可能会觉得使用 htql 更简单：

import htql
results=htql.query(html_data, "<table>1.<tr> {c1=<td>1:tx; c4=<td>4:tx } ");

回答于 2025-04-18 由 Python大师

分享举报

你的代码有好几个地方出错了。比如这一行：

soup.find('table')[0]body.findall('tr'):

这行代码是没什么意义的。当你使用 find 的时候，它只会返回一个单独的 BS 对象。你不能用索引去访问一个单独对象里的元素。而当你使用 findAll 的时候，它会返回一个 BS 对象的列表。这就意味着你需要循环遍历这个列表，才能获取到每一个单独的元素。这就是为什么你 for 循环的内容不会按预期工作。

下面是能帮你实现目标的代码：

from bs4 import BeautifulSoup

html_file = open('html_file')
soup = BeautifulSoup(html_file)

table = soup.findAll('table')[0]
rows = table.findAll('tr')

first_columns = []
third_columns = []
for row in rows[1:]:
    first_columns.append(row.findAll('td')[0])
    third_columns.append(row.findAll('td')[2])

for first, third in zip(first_columns, third_columns):
    print(first.text, third.text)

回答于 2025-04-18 由 Python大师

分享举报

使用Beautiful Soup的CSS选择器支持：

first_column = soup.select('table tr td:nth-of-type(1)')
fourth_column = soup.select('table tr td:nth-of-type(4)')

回答于 2025-04-18 由 Python大师

分享举报

使用Beautiful Soup提取特定列

3 个回答

撰写回答