无法使BeautifulGroup正确识别列（Python，xml（Excel web）html文件） - 问答 - Python中文网

无法使BeautifulGroup正确识别列（Python，xml（Excel web）html文件）

2024-07-04 10:53:34 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在处理许多这种格式的文件（不使用html样式）：

<html xmlns:x="urn:schemas-microsoft-com:office:excel"> <head> <meta name="Generator" content="SAS Software Version 9.3, see www.sas.com"> <meta http-equiv="Content-type" content="charset=windows-1252"> </head> <body> <table class="table"> <colgroup> <col> <col> <col> <col> </colgroup> <colgroup> <col> <col> </colgroup> <thead> <tr> <td class="header" rowspan="2" colspan="4" scope="colgroup"> </td> <td class="header" colspan="2" scope="colgroup">SubDistrict</td> </tr> <tr> <td class="header" scope="col">Title1 <br> <br> </td> <td class="header" scope="col">Title2 <br> <br> </td> </tr> </thead> <tbody> <tr> <td class="rowheader" rowspan="12" scope="rowgroup">M1</td> <td class="rowheader" scope="row">1.1</td> <td class="rowheader" scope="row">var1</td> <td class="rowheader" scope="row">TOTAL</td> <td class="data">7</td> <td class="data">7</td> </tr> <tr> etc...

在浏览器中，它们显示如下：

我在《美丽的汤》一书中写了以下内容，我对这本书很陌生：

但我的代码导致了严重的列对齐问题：

关于如何改进我的beauthulsoup代码以更正确地解析它有什么建议吗？谢谢。在

Tags： br com html col content head tr meta

1条回答

网友

1楼 · 发布于 2024-07-04 10:53:34

如果我理解得很好，这就是你想提取的：

您应该能够通过以下代码获得它：

def read_xls(file):
    f = open(file)

    soup = BeautifulSoup(f.read())
    tbody = soup.find('tbody')
    data = []
    trs = tbody.findAll('tr')
    for tr in trs:
        tds = tr.findAll('td')
        for td in tds:
            data.append(td.text)

    return pd.DataFrame(data).T

相关问题更多 >

编程相关推荐

热门问题

热门文章