我如何用python webscraping从html代码中读取这些单元格?

2024-10-01 00:31:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从这个网站上刮下外汇价格信息,然后把它放到数据库:https://www.mnb.hu/arfolyamok

我需要html的这部分:

<tbody>
    <tr>
        <td class="valute"><b>CHF</b></td>
        <td class="valutename">svájci frank</td>
        <td class="unit">1</td>
        <td class="value">284,38</td>
    </tr>
    <tr>
        <td class="valute"><b>EUR</b></td>
        <td class="valutename">euro</td>
        <td class="unit">1</td>
        <td class="value">308,54</td>
    </tr>
    <tr>
        <td class="valute"><b>USD</b></td>
        <td class="valutename">USA dollár</td>
        <td class="unit">1</td>
        <td class="value">273,94</td>
    </tr>
</tbody>

这就是为什么我写了一个代码,但有点不对劲。我该怎么修,在哪里改?我只需要“valute”、“valutename”、“unit”和“value”数据。我正在Windows7上使用Python2.7.13。你知道吗

下一条错误消息是:“程序中有一个错误:unindent不匹配任何外部缩进级别”

代码如下:

import csv
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
   for cell in row.findAll('td'):
       text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
   list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

Tags: ofimportvaluehtmltableunittrlist
1条回答
网友
1楼 · 发布于 2024-10-01 00:31:53

从第18 for cell in row.findAll('td'):行到第20 list_of_cells.append(text)行的代码中有一个space问题。以下是固定代码:

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

但是,在执行这个代码之后,您将面临另一个问题,那就是字符编码错误。它将显示“SyntaxError: Non-ASCII character '\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

怎么解决?很简单。。。在代码的最顶端添加shebang# -*- coding: utf-8 -*-(第1行)。它应该会修好的。你知道吗

编辑:刚刚注意到您使用BeautifulSoup的方式有误,导入的方式也有误。我已经将导入修复为from bs4 import BeautifulSoup,在使用BeautifulSoup时,还需要指定解析器。所以

soup = BeautifulSoup(html)

将变成:

soup = BeautifulSoup(html, "html.parser")

相关问题 更多 >