如何用正确的unicode字符替换转义的unicode字符？

'https://www.jobtestprep.co.uk/media/24543/xnumber-series-big-1.png,qanchor\\u003dcenter,amode\\u003dcrop,awidth\\u003d473,aheight\\u003d352,arnd\\u003d131255524960000000.pagespeed.ic.YolXsWmhs0.png'

2条回答

网友

1楼 · 编辑于 2024-09-27 07:35:31

你可以这样做。在

>>> url = (
...    'https://www.jobtestprep.co.uk/media/24543/xnumber-series-'
...    'big-1.png,qanchor\\u003dcenter,amode\\u003dcrop,awidth\\u003d473,'
...    'aheight\\u003d352,arnd\\u003d131255524960000000.pagespeed.ic.YolXsWmhs0.png'
... )
>>> url = url.encode('utf-8').decode('unicode_escape')
>>> print(url)
https://www.jobtestprep.co.uk/media/24543/xnumber-series-big-1.png,qanchor=center,amode
=crop,awidth=473,aheight=352,arnd=131255524960000000.pagespeed.ic.YolXsWmhs0.png
>>>

网友

2楼 · 编辑于 2024-09-27 07:35:31

您的regex从网页中提取JSON字符串：

searched_results = re.findall(r"(?<=,\"ou\":\")[^\s]+[\w](?=\",\"ow\")", results_source)

你删除的那些"卡特尔实际上很重要。这里的\uxxxx转义语法是特定于JSON（和Javascript）语法的；它们与Python的使用密切相关，但是不同（不多，但是当您有非BMP代码点时，这很重要）。在

如果保留引号，则可以将它们简单地解码为JSON：

^{pr2}$

更好的方法是使用HTML库来解析页面。{使用数据}可以得到}：

import json
from bs4 import BeautifulSoup

soup = BeautifulSoup(results_source, 'html.parser')
search_results = [json.loads(t.text)['ou'] for t in soup.select('.rg_meta')]

这将把每个<div class="rg_meta" ...>元素的文本内容作为JSON数据加载，并从每个生成的字典中提取ou键。不需要正则表达式。在

相关问题更多 >

编程相关推荐

热门问题

热门文章