JSON转义双引号

2024-09-30 01:27:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我知道这个标题在这里似乎很流行,但是快速浏览它们通常会涉及到asker有一个单独的JSON部分的情况。在

在某些情况下,"用于表示英寸,或者它包装一个短语来表示某种昵称,不管是在JS对象的值字符串中出现的方式,该对象已经用双引号括起来了。在

下面是我遇到问题的JS object string的一个示例(我使用正则表达式对键进行双引号引用并删除多余的空格,但这是一个非常出色的被刮伤的字符串):

'{\n\t\t\n\t\t\t\t\t\n\t\n\n\t\n\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"16241885":{title: "Nosefrida Fridababy Windi Gas & Colic Relief", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
\n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"8650356":{title: "Babyganics Face- Hand & Baby Wipes- Fragrance Free- 100 Count", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"16249889":{title: "Nosefrida Nasal Aspirator Replacement Filters", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"8650355":{title: "Babyganics Face- Hand & Baby Wipes- Fragrance Free- 40 Count", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"15490928":{title: "BabyGanics Newborn Ultra Absorbent Jumbo Size Diapers - 36 Count", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"14712536":{title: "Marvel Superhero Bandages", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"16263505":{title: "Nosefrida "The Snotsucker" Nasal Aspirator", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t\t\t, \n\t\t\n\t\n\n\t\t\n\n\t\t \n\n\n\n 
"14848093":{title: "Zarbee\'s Children\'s Cough Syrup - Grape", isIneligible: false, isDiscontinued: false, isLowInventory: false, isAllowed: true}
    \n\n\t\n\n\t\t\n\t \n\n\t\t\n\t}'

我已经尝试了,json.dumps在字符串上,但这只是双转义,需要一个双json.loads这让我回到第一步。我试过这样的正则表达式:

^{pr2}$

虽然这似乎是最有希望的,但似乎在sub中重新插入了不带转义符的双引号,而在第一组中也没有subbing(我尝试过在子函数r)中使用或不使用原始字符串),因此对于上面的问题部分(下面是一个子字符串):

 "16263505":{title: "Nosefrida "The Snotsucker" Nasal Aspirator"

该模式没有将1分组返回,并且由于某些原因,将其归入一个单引号中(下面是失败的regex处理的子字符串):

"16263505":{title: "The Snotsucker"' Nasal Aspirator"

无论是哪种方式,json.loads都会抱怨未被转义的"。在

编辑1: 我的正则表达式可以拉出未经转义的引号,但它的subbing它的行为并不像预期的那样,我可能在做一些愚蠢的事情,需要一个新的眼睛。在

带有print语句的函数输出示例:

low_inventory = response.xpath(
                '//script[contains(., "islistEligibility") or contains(., "ishlistEligibility")]/text()'
                ).re_first(r'(?s)(?<=registryWislistEligibilityMap)(?:\s*=\s*)(\{.+\})')

In [453]: for m in double_quotes_in_json.finditer(low_inventory):
     ...:     groups_matched = len(m.groups())
     ...:     print('groups: ', m.groups())
     ...:     entire_match = m.group()
     ...:     print('entire match: ', m.group())
     ...:     if groups_matched == 3:
     ...:             # we only matched a single double quote
     ...:             subbed_match = double_quotes_in_json.sub(r'$1\\$2$3', entire_match)
     ...:             print('subbed3: ', subbed_match)
     ...:             jsn_string = re.sub(entire_match, subbed_match, jsn_string)
     ...:     elif groups_matched == 4:
     ...:             subbed_match = double_quotes_in_json.sub(r'$1\\$2$3\\\$4', entire_match)
     ...:             print('subbed4: ', subbed_match)
     ...:             jsn_string = re.sub(entire_match, subbed_match, jsn_string)
     ...: print(jsn_string)
     ...: 
groups:  (' "Nosefrida ', '"', 'The Snotsucker', '"')
entire match:   "Nosefrida "The Snotsucker"
subbed4:   "Nosefrida "The Snotsucker"
{  "16241885":{"title": "Nosefrida Fridababy Windi Gas &amp; Colic Relief", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "8650356":{"title": "Babyganics Face- Hand &amp; Baby Wipes- Fragrance Free- 100 Count", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "16249889":{"title": "Nosefrida Nasal Aspirator Replacement Filters", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "8650355":{"title": "Babyganics Face- Hand &amp; Baby Wipes- Fragrance Free- 40 Count", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "15490928":{"title": "BabyGanics Newborn Ultra Absorbent Jumbo Size Diapers - 36 Count", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "14712536":{"title": "Marvel Superhero Bandages", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "16263505":{"title": "The Snotsucker"' Nasal Aspirator", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true},   "14848093":{"title": "Zarbee's Children's Cough Syrup - Grape", "isIneligible": false, "isDiscontinued": false, "isLowInventory": false, "isAllowed": true} }

Tags: the字符串falsetruestringtitlematchgroups
1条回答
网友
1楼 · 发布于 2024-09-30 01:27:44

由于某些原因,使用pythons内置的replace函数可以获得预期的结果,而re.sub公司没有正确地转义双引号。(这是在带有单转义符的原始字符串或带有双转义符的常规字符串中使用组引用)。不管怎样,这里是工作函数。如果有人对为什么使用替换有效re.sub公司我很想知道为什么会这样。在

(旧代码注释掉)

double_quotes_in_json = re.compile(r'(?<=:)(\s*")([^"]*)(")([^"]*)(")?(?=[^"]*",|"\s*\})')


def escape_double_quotes(jsn_string, pattern=double_quotes_in_json):
    for match in pattern.finditer(jsn_string):
        # current pattern only matches 1 instance of either one double quote in JSON value string
        # (presumably signifying inches) or 1 instance of phrase wrapped in double quotes
        # for something like nicknames
        # matches will have either 3 or 4 groups, representing one of the 2 match types described above
        num_groups_matched = len(match.groups())
        groups = match.groups()
        entire_match = match.group()
        print('groups: ', match.groups())
        print('entire: ', entire_match)
        if num_groups_matched == 4:
            # we only matched one double quote
            # subbed_match = pattern.sub('$1$2\\$3$4', entire_match)
            # jsn_string = re.sub(entire_match, subbed_match, jsn_string)
            target = ''.join(groups[1:4])
            replaced = target.replace('"', '\\"')
            print(replaced)
            jsn_string = jsn_string.replace(target, replaced)
        elif num_groups_matched == 5:
            # we matched a phrase wrapped in double quotes
            # subbed_match = pattern.sub('$1$2\\$3$4\\$5', entire_match)
            # jsn_string = re.sub(entire_match, subbed_match, jsn_string)
            target = ''.join(groups[1:])
            replaced = target.replace('"', '\\"')
            print(replaced)
            jsn_string = jsn_string.replace(target, replaced)
    return jsn_string

编辑#1(又名:在一些睡眠方法之后):

^{pr2}$

感谢@deceze的帮助。在

相关问题 更多 >

    热门问题