Pandas阅读html未正确阅读文本

2024-07-07 07:57:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下文字:

text = """<table class="table table-striped">\n <thead>\n <tr>\n <th data-field="placement">Placement</th>\n <th data-field="production">Production</th>\n <th data-field="application">Eng.Vol.</th>\n <th data-field="body">Body No</th>\n <th data-field="eng">Eng No</th>\n <th data-field="eng">Notes</th>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18 LHD</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW28</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">2.0 L</td>\n <td data-field="body">HRW38 RHD</td>\n <td data-field="eng">R20A9</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n </thead>\n </table>"""

此HTML文本使用表标记正确关闭,并具有所有必需的标记。不过,熊猫并不是作为一张桌子在读书

代码:

pd.read_html(text)

输出:

[Empty DataFrame
 Columns: [(Placement, Front Stabilizer, Front Stabilizer, Front Stabilizer, Front Stabilizer), (Production, Oct 16~, Oct 16~, Oct 16~, Oct 16~), (Eng.Vol., 1.5 L, 1.5 L, 1.5 L, 2.0 L), (Body No, HRW18, HRW18 LHD, HRW28, HRW38 RHD), (Eng No, L15BY, L15BY, L15BY, R20A9), (Notes, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right)]
 Index: []]```



Tags: posrightfielddataleftplacementengtr
1条回答
网友
1楼 · 发布于 2024-07-07 07:57:38

您的表被包装在<thead></thead>中。熊猫把一切解释为专栏是可以理解的。让我们试试:

tmp=pd.read_html(text)[0]

pd.DataFrame(tmp.columns.to_frame().values)

输出:

    0           1                 2                 3                 4
                                                
 0  Placement   Front Stabilizer  Front Stabilizer  Front Stabilizer  Front Stabilizer
 1  Production  Oct 16~           Oct 16~           Oct 16~           Oct 16~
 2  Eng.Vol.    1.5 L             1.5 L             1.5 L             2.0 L
 3  Body No     HRW18             HRW18 LHD         HRW28             HRW38 RHD
 4  Eng No      L15BY             L15BY             L15BY             R20A9
 5  Notes       Pos:Left/Right    Pos:Left/Right    Pos:Left/Right    Pos:Left/Right

相关问题 更多 >