如何从任何网站上爬取表格并存储为数据框架？问题的回答

如何从任何网站上爬取表格并存储为数据框架？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我需要从<a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M</a>中删除一张桌子并将这些数据存储在python数据帧中。我拉过桌子，但无法选择列（邮政编码、自治区、邻里） 我的桌子是这样的： <pre><code><table class="wikitable sortable"> <tbody><tr> <th>Postcode</th> <th>Borough</th> <th>Neighbourhood </th></tr> <tr> <td>M1A</td> <td>Not assigned</td> <td>Not assigned </td></tr> <tr> <td>M2A</td> <td>Not assigned</td> <td>Not assigned </td></tr> <tr> <td>M3A</td> <td><a href="/wiki/North_York" title="North York">North York</a></td> <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a> </td></tr> <tr> <td>M4A</td> <td><a href="/wiki/North_York" title="North York">North York</a></td> <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a> </td></tr> ... url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' response = requests.get(url) soup= BeautifulSoup(response.text, "html.parser") table = soup.find('table', {'class': 'wikitable sortable'}) df = [] for row in table.find_all('tr'): columns = row.find_all('td') Postcode = row.columns[1].get_text() Borough = row.columns[2].get_text() Neighbourhood = row.column[3].get_text() df.append([Postcode,Borough,Neighbourhood]) </code></pre> 用上面的代码我得到 TypeError:“NoneType”对象不可订阅 我在谷歌上搜索了一下，才知道我做不到邮编=行.列[1] .get\文本（）因为函数的内联属性。你知道吗 我也尝试了其他方法，但得到了一些“索引错误消息”。你知道吗 很简单。我需要遍历该行，继续为每行选取三列，并将其存储在列表中。但我不能用代码来写。你知道吗 预期输出为 <pre><code> Postcode Borough Neighbourhood M1A Not assigned Not assigned M2A Not assigned Not assigned M3A North York Parkwoods </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何从任何网站上爬取表格并存储为数据框架？

1 个回答

相关Python问题