仅使用regex提取json值

2024-09-30 16:26:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个嵌入在json中的描述字段,我无法利用json库来解析这些数据。在

我使用{0,23}来尝试提取字符串的前23个字符,如何提取与描述相关的整个值?在

   import re

    description = "'\description\" : \"this is a tesdt \n another test\" "

    re.findall(r'description(?:\w+){0,23}', description, re.IGNORECASE)

对于上述代码,只显示['description']


Tags: 数据字符串testimportrejson利用is
2条回答
# First just creating some test JSON

import json

data = {
    'items': [
        {
            'description': 'A "good" thing',

            # This is ignored because I'm assuming we only want the exact key 'description'
            'full_description': 'Not a good thing'
        },
        {
            'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
        },
    ]
}

j = json.dumps(data)

print(j)

# The actual code

import re

pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [

    # I'm using json.loads just to parse the matched string to interpret
    # escapes properly. If this is not acceptable then ast.literal_eval
    # will probably also work
    json.loads(d)
    for d in re.findall(pattern, j)]

# Testing that it works

assert descriptions == [item['description'] for item in data['items']]

你可以试试这个代码:

import re

description = "description\" : \"this is a tesdt \n another test\" "

result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]

print(result)

结果是:

^{2}$

基本上是:

\"this is a tesdt \n another test\"

这就是你在评论中要求的。在


解释-

(?<=description")是一个积极的后视,它告诉正则表达式匹配description"
(?:\s*\:\s*)是一个非捕获组,它告诉regex description"后面将跟零个或多个空格、一个冒号(:)和零个或多个空格。
(".{0,23}?(?=")")是实际需要的匹配,由双引号(")、0到23个字符和结尾的双引号(")组成。在

相关问题 更多 >