我有一个html保存在我的本地驱动器中,它由多个表组成,但我想从整个页面中提取一些特定的表并导出到csv。所以我用python编写了一个小脚本,它提供了完整的html文本数据,现在我不知道如何从中提取数据
Python代码-
import pandas as pd
url = "table1.html"
tables = pd.read_html(url)[0]
print(tables)
html文件为-
<!--?xml version="1.0" encoding="utf-16"?-->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse;" width="100%">
<tbody>
<tr>
<td style="border:none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse;" width="100%">
<tbody>
<tr style="height:70px">
<td style="width: 80%;border: none;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">Backup job: MUMHOILNDDB01 Backup 1
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;"> </div>
</td>
<td style="border: none;padding: 0px;font-family: Tahoma;font-size: 12px;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">Error
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;">1 of 1 hosts processed</div>
</td>
</tr>
<tr>
<td colspan="2" style="border: none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table border="0" cellpadding="0" cellspacing="0" class="inner" style="margin: 0px;border-collapse: collapse;" width="100%">
<tbody>
<tr style="height: 17px;">
<td class="sessionDetails" colspan="9" style="border-style: solid; border-color:#a7a9ac; border-width: 1px 1px 0 1px;height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;"><span>Tuesday, August 4, 2020 11:00:17 AM</span></td>
</tr>
<tr style="height: 17px;">
<td nowrap="nowrap" style="width: 1%;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Success</b></td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:17 AM</td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Total size</b></td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Backup size</b></td>
<td nowrap="nowrap" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td rowspan="3" style="border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;vertical-align: top;"> </td>
</tr>
<tr style="height: 17px;">
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Warning</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Data read</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Dedupe</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Error</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:24</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Compression</b></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td colspan="9" nowrap="nowrap" style="height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;border: 1px solid #a7a9ac;">Details</td>
</tr>
<tr class="processObjectsHeader" style="height: 23px">
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Name</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Status</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Size</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Read</b></td>
<td nowrap="nowrap" style="width:1%;background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="nowrap" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Details</b></td>
</tr>
<tr style="height: 17px;">
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">MUMHOILNDDB01</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span style="color: #FF0000;">Error</span></td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:19 AM</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="nowrap" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:21</td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span class="small_label" style="font-size: 10px;">Backup job has failed<br />
Backup task has been failed<br />
Processing finished with errors at 2020-08-04 11:00:42 GMT</span></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
由于上面所需的数据在html文件中多次出现,所以我希望在html文件中获得这些列中出现的所有多个数据。由于html文件有点复杂,而且我对它们还不熟悉,所以我不知道如何开始。 谢谢你的帮助
注意这里有
3
个表。因此,您可以根据自己的选择进行分配输出:view-online
最新答复:
输出:view-online
相关问题 更多 >
编程相关推荐