是否可以像本网站(toolslick)一样使用Python将HTML转换为JSON?

2024-09-29 19:18:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个HTML,我去toolsick.com把它转换成JSON。所以,我得到了这个JSON,我想知道是否有可能完全按照它的原样,但用Python。我能用什么?正则表达式?图书馆?一圈?我尝试了一些事情,但没有成功。它不需要是JSON格式,但我认为这是最好的,因为我可以使用['tr'][0]访问这些值。多谢各位

HTML:
<tr>
    <td>
        <span class="theme1">1</span> Charisma
    </td>
    <td>
        <span class="theme1">1</span> Smartness
    </td>
    <td>
        <span class="theme1">1</span> Health
    </td>
</tr>
<tr>
    <td></td>
    <td></td>
    <td>Age: 
        <span class="green">20</span>
    </td>
</tr>
<tr>
    <td colspan="3" class="active">Strength: 
        <span class="tooltip" data-tip="Lorem ipsum dolor sit amet, consectetur">
            <icon>i-hand</icon> Hand
        </span>
    </td>
</tr>
<tr>
    <td colspan="3" class="inactive">Weakness: 
        <span class="tooltip" data-tip="Donec egestas lectus quis">
            <icon>i-feet</icon> Feet
        </span>
    </td>
</tr>

JSON:
{
  "tr": [
    {
      "td": [
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Charisma"
        },
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Smartness"
        },
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Health"
        }
      ]
    },
    {
      "td": [
        "",
        "",
        {
          "span": {
            "@class": "green",
            "#text": "20"
          },
          "#text": "Age:"
        }
      ]
    },
    {
      "td": {
        "@colspan": "3",
        "@class": "active",
        "span": {
          "@class": "tooltip",
          "@data-tip": "Lorem ipsum dolor sit amet, consectetur",
          "icon": "i-hand",
          "#text": "Hand"
        },
        "#text": "Strength:"
      }
    },
    {
      "td": {
        "@colspan": "3",
        "@class": "inactive",
        "span": {
          "@class": "tooltip",
          "@data-tip": "Donec egestas lectus quis",
          "icon": "i-feet",
          "#text": "Feet"
        },
        "#text": "Weakness:"
      }
    }
  ]
}

Tags: textjsondatahtmltrclasstdicon
1条回答
网友
1楼 · 发布于 2024-09-29 19:18:45

适合此任务的库很少,例如html2JsonBeautifulSoup

LXML也是一个用于解析数据的库,请参见example

但是使用它们并不能提供您想要的JSON格式。对于给定的<tr> elements </tr>标记,可能是这样的

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

如您所见,这不包括元数据,如class@data-tip等。因此,最好且最简单的选择是使用您拥有的JSON格式,并使用它访问您想要的数据

比如说

import json

json_dict = json.load(JSON)#your data
 # Now you can use it like dictionary
 # For example:

print(json_dict["key"])

相关问题 更多 >

    热门问题