我有这个HTML,我去toolsick.com把它转换成JSON。所以,我得到了这个JSON,我想知道是否有可能完全按照它的原样,但用Python。我能用什么?正则表达式?图书馆?一圈?我尝试了一些事情,但没有成功。它不需要是JSON格式,但我认为这是最好的,因为我可以使用['tr'][0]访问这些值。多谢各位
HTML:
<tr>
<td>
<span class="theme1">1</span> Charisma
</td>
<td>
<span class="theme1">1</span> Smartness
</td>
<td>
<span class="theme1">1</span> Health
</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Age:
<span class="green">20</span>
</td>
</tr>
<tr>
<td colspan="3" class="active">Strength:
<span class="tooltip" data-tip="Lorem ipsum dolor sit amet, consectetur">
<icon>i-hand</icon> Hand
</span>
</td>
</tr>
<tr>
<td colspan="3" class="inactive">Weakness:
<span class="tooltip" data-tip="Donec egestas lectus quis">
<icon>i-feet</icon> Feet
</span>
</td>
</tr>
JSON:
{
"tr": [
{
"td": [
{
"span": {
"@class": "theme1",
"#text": "1"
},
"#text": "Charisma"
},
{
"span": {
"@class": "theme1",
"#text": "1"
},
"#text": "Smartness"
},
{
"span": {
"@class": "theme1",
"#text": "1"
},
"#text": "Health"
}
]
},
{
"td": [
"",
"",
{
"span": {
"@class": "green",
"#text": "20"
},
"#text": "Age:"
}
]
},
{
"td": {
"@colspan": "3",
"@class": "active",
"span": {
"@class": "tooltip",
"@data-tip": "Lorem ipsum dolor sit amet, consectetur",
"icon": "i-hand",
"#text": "Hand"
},
"#text": "Strength:"
}
},
{
"td": {
"@colspan": "3",
"@class": "inactive",
"span": {
"@class": "tooltip",
"@data-tip": "Donec egestas lectus quis",
"icon": "i-feet",
"#text": "Feet"
},
"#text": "Weakness:"
}
}
]
}
适合此任务的库很少,例如html2Json、BeautifulSoup
LXML也是一个用于解析数据的库,请参见example
但是使用它们并不能提供您想要的JSON格式。对于给定的
<tr> elements </tr>
标记,可能是这样的如您所见,这不包括元数据,如
class
、@data-tip
等。因此,最好且最简单的选择是使用您拥有的JSON格式,并使用它访问您想要的数据比如说
相关问题 更多 >
编程相关推荐