用Python解码html中的unicode unicode

2024-06-28 19:12:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我想取消浏览/解码这个HTML

\u003Cdiv class=\u0022col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search\u0022\u003E\n        \u003C!-- Block2 --\u003E\n        \u003Cdiv class=\u0022block2\u0022\u003E\n            \u003Cdiv class=\u0022block2-pic hov-img0\u0022\u003E\n                \u003Ca href=\u0022https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285\u0022\u003E\n                    \u003Cimg src=\u0022https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752--1--1597741927.jpeg\u0022 alt=\u0022IMG-PRODUCT\u0022\u003E\n                \u003C\/a\u003E\n                                \u003Cdiv class=\u0022product_tag\u0022\u003E\n 

我尝试的是

response.text.replace('"','').encode('utf-8').decode( 'unicode-escape' )

但结果并不像预期的那样

<a href="https:\\/\\/abc.com\\/puffed-sleeve-dress-\\/p\\/79515"\n                       class="stext-104 cl4 hov-cl1 trans-04 js-name-b2 p-b-6">\n  <\\/span>\n

输出中存在URL和HTML标记结尾反斜杠。。。。 有没有解码它们的帮助This site does it properly


Tags: comhtmlcol解码classhrefabcu003c
1条回答
网友
1楼 · 发布于 2024-06-28 19:12:31

您可以使用python 3.8

strubgs ='\u003Cdiv class=\u0022col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search\u0022\u003E\n        \u003C!  Block2  \u003E\n        \u003Cdiv class=\u0022block2\u0022\u003E\n            \u003Cdiv class=\u0022block2-pic hov-img0\u0022\u003E\n                \u003Ca href=\u0022https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285\u0022\u003E\n                    \u003Cimg src=\u0022https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752 1 1597741927.jpeg\u0022 alt=\u0022IMG-PRODUCT\u0022\u003E\n                \u003C\/a\u003E\n                                \u003Cdiv class=\u0022product_tag\u0022\u003E\n '
import html
print(html.unescape(strubgs))

跟随输出

<div class="col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search">
        <!  Block2  >
        <div class="block2">
            <div class="block2-pic hov-img0">
                <a href="https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285">
                    <img src="https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752 1 1597741927.jpeg" alt="IMG-PRODUCT">
                <\/a>
                                <div class="product_tag">

相关问题 更多 >