无法使用正则表达式从字符串末尾的特殊字符中检索JSON终止符

2024-06-26 14:16:54 发布

您现在位置:Python中文网/ 问答频道 /正文

此Json字符串的终止符为!!,但不是在最后一条记录处,因此我无法通过({.*?}!!)检索所有相应的记录。当我使用({.*?})时,我可以检索所有记录,但不检索具有完成值的记录

JSON

x = {'d':'AAAAA@5##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!AAAAA@6##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!AAAAA@7##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}'}

格式不错

x = {'d':'AAAAA@5##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}
     !!AAAAA@6##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}
     !!AAAAA@7##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}'}

Python代码

re.findall(r"(AAAAA@\d+##)({.*?})", x['d'])

结果

 [('AAAAA@5##', '{"pp-0":[{"pp-1": 1000, "pp-3": 1003}'),
 ('AAAAA@6##', '{"pp-0":[{"pp-1": 1000, "pp-3": 1003}'),
 ('AAAAA@7##', '{"pp-0":[{"pp-1": 1000, "pp-3": 1003}')]

当我使用以下代码时

re.findall(r"(AAAAA@\d+##)({.*?}!!)", x['d'])

第二个结果

 [('AAAAA@5##',
 '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 
   1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!'),
 ('AAAAA@6##',
 '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 
 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!')]

只显示两条记录

我的预期结果

  [('AAAAA@5##',
  '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 
   1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!'),
 ('AAAAA@6##',
 '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 
 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!'),
   ('AAAAA@7##',
 '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 
 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}')]

Tags: 字符串代码rejson格式记录ppaaaaa
1条回答
网友
1楼 · 发布于 2024-06-26 14:16:54
import re

x = {'d':'AAAAA@5##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!AAAAA@6##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}!!AAAAA@7##{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}'}

# simply split the json by !! using str.split() method
# then find the groups
# group 1  > (.*##)
# group 2 > ({.*})

[re.findall(r'(.*##)({.*})', i)[0] for i in x['d'].split('!!')]
[('AAAAA@5##',
  '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}'),
 ('AAAAA@6##',
  '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}'),
 ('AAAAA@7##',
  '{"pp-0":[{"pp-1": 1000, "pp-3": 1003},{"pp-4": 1004, "pp-7": 1007},{"pp-8": 1008, "pp-11": 1011},{"pp-12": 1012,"pp-17": 1015}],"pp-17": 1015,"pp-17": 1015}')]

或者完全使用正则表达式

re.findall(r'([^!]+##)({[^!]+})', x['d'])

查找正则表达式解释here

相关问题 更多 >