python正则表达式从多行花括号中查找字符串

2024-10-02 16:26:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一根这样的绳子。如何创建一个字典,第一个标记作为键,后面的所有标记作为值

test_string = """###Some Comment 
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
} 
"""

例如: 钥匙将是第一个标签 而价值将是

{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
} 

[编辑:字符串数据在文件中。问题是从文件中读取并创建一个字典,其中键是注释,值是Json数据]

例如,文件将具有:

###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

Tags: 文件标记字典soontagscommentsome
2条回答

您可以使用此正则表达式,它将匹配组1中:之前的最后一组单词字符(包括-),然后将所有其他字符匹配到组2中的下一个注释(###)或字符串结尾:

([\w-]+)\s*:\s*(.*?)(?=\s*###|$)

然后,您可以通过对字符串中的每个匹配项在两个组上进行迭代来创建字典:

import re

test_string = """
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    }
"""
res = {}
for match in re.finditer(r'([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = match.group(2)

print(res)

输出:

{
 'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
}

更新

如果您还希望获取注释,可以使用以下代码:

res = {}
for match in re.finditer(r'###([^\n]+)\s*([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = { match.group(2) : match.group(3) }

print(res)

输出:

{
 'Some Comment ': {
   'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
'2nd Comment ': {
   'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
 'Some other Comment ': {
  'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 }
}

因此,这里我尝试将字符串转换为JSON

但是为了让它起作用,我的str应该是JSON而不是别的

所以我找到了第一个{并从那里获取字符串

import json

my_str = '''
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  }
  '''
# find the first {
i = my_str.index('{')
my_str = my_str[i:] # trim the string so that only dict is left
my_dict = dict(json.loads(my_str)) # create JSON and then convert that to dict
print(my_dict) # n'joy

如果需要,还可以查找JSON的结尾并修剪str(查找}

根据问题中的更新更新解决方案更新

import json

my_str = '''
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 
'''
data = []
bal = 0
start = end = 0
for i,v in enumerate(my_str):
    if v == '{': 
        if bal == 0:
            start = i
        bal+=1
    elif v=='}': 
        bal-=1
        end = i
    if start!=end and bal ==0: # just looking for data in {....}
        new_str = my_str[start:end+1]
        print(new_str)
        my_dict = dict(json.loads(new_str))
        data .append(my_dict)
        start = end = i+1
print(data) # n'joy
[{'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}]

相关问题 更多 >