如何从非结构化数据文件提取json对象

2024-09-27 21:34:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要一些建议。我有一个文本文件包含一些信息,需要提取和保存为JSON文件。文件在块中是非结构化的。请查看以下内容:

我怎样才能做到这一点?我只是不知道如何开始。 我的想法是找到类型:Router,但是我如何迭代每个块,只选择P-2-P块细节。谢谢你的建议。你知道吗

Type      : Router
  Ls id     : 1.1.1.2
  Adv rtr   : 1.1.1.2  
  Ls age    : 201 
  Len       : 84   
  Link count: 5
   * Link ID: 1.1.1.2    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.4    
     Data   : 192.168.100.34  
     Link Type: P-2-P        
     Metric : 1
   * Link ID: 192.168.100.33  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.1    
     Data   : 192.168.100.53  
     Link Type: P-2-P        
     Metric : 1
   * Link ID: 192.168.100.54  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium

  Type      : Router
  Ls id     : 1.1.1.1
  Adv rtr   : 1.1.1.1  
  Ls age    : 1699 
  Len       : 96 
  Options   :  ASBR  E  
  seq#      : 80008d72 
  chksum    : 0x16fc
  Link count: 6
   * Link ID: 1.1.1.1    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.1    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 12 
     Priority : Medium
   * Link ID: 1.1.1.3    
     Data   : 192.168.100.26  
     Link Type: P-2-P        
     Metric : 10
   * Link ID: 192.168.100.25  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 10 
     Priority : Medium
   * Link ID: 1.1.1.2    
     Data   : 192.168.100.54  
     Link Type: P-2-P        
     Metric : 10
   * Link ID: 192.168.100.53  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 10 
     Priority : Medium

仅在具有类型:Router的每个块上提取。在此块中,要捕获的信息是:

(1)Ls id  : 1.1.1.2
and under link count, info to capture is block that only have link type:P-2-P
(a)Link ID: 1.1.1.4   
(b)Data   : 192.168.100.34 

(c)Link Type: P-2-P 

(d)Metric : 1

(a)Link ID: 1.1.1.3    
(b)Data   : 192.168.100.53  
(c)Link Type: P-2-P    
(d)Metric : 1

Then for another Type: Router block. To capture
(2)Ls id  : 1.1.1.1
and under link count, info to capture is block that only have link type:P-2-P
(a)Link ID: 1.1.1.3   
(b)Data   : 192.168.100.26 
(c)Link Type: P-2-P 
(d)Metric : 10

(a)Link ID: 1.1.1.2    
(b)Data   : 192.168.100.54  
(c)Link Type: P-2-P    
(d)Metric : 10

**There is another Link Type (StubNet) but the only interested to capture is block that have Link Type:P-2-P**

在JSON中预期如下:

{
  "oppf": [
    {
      "Sid": "1.1.1.2",
      "Did": "1.1.1.4",
      "Sport": " 192.168.100.34",
      "Netype": "P-2-P",
      "Metric": "1"
    },
    {
      "Sid": "1.1.1.2",
      "Did": "1.1.1.1",
      "Sport": " 192.168.100.53",
      "Netype": "P-2-P",
      "Metric": "1"
    },
    {
      "Sid": "1.1.1.1",
      "Did": "1.1.1.3",
      "Sport": " 192.168.100.26",
      "Netype": "P-2-P",
      "Metric": "10"
    },
    {
      "Sid": "1.1.1.1",
      "Did": "1.1.1.2",
      "Sport": " 192.168.100.54",
      "Netype": "P-2-P",
      "Metric": "10"
    }
   ],
}

Tags: iddataistypecountlinkmetricls
2条回答

仅获取p-2-p类型:

data = "..."

import json
result = {}
l = []
for i in data.split("\n\n"):
    if i:
        p = [parameter for parameter in i.split("*")]
        for line, x in enumerate(p[0].split("\n")):
            if x and "Ls id" in x:
                ls_id, ip = x.split(": ")
                ls_id = ls_id.strip()
                ip = ip.strip()
        for y in p[1:]:
            if y and "P-2-P" in y:
                temp = {ls_id:ip}
                for items in y.split("\n"):
                    try:
                        key, value = items.split(": ")
                        key = key.strip()
                        value = value.strip()
                        temp[key] = value
                    except ValueError:
                       pass
                l.append(temp)
result["oppf"] = l
print (json.dumps(result,indent=2))

对我来说,这是很好的结构。它有不同的缩进来识别子项和*来识别新字典的开始,还有空行来识别新路由。它还有:来分割行并获取键和值。你知道吗

data = '''  Type      : Router
  Ls id     : 1.1.1.2
  Adv rtr   : 1.1.1.2  
  Ls age    : 201 
  Len       : 84   
  Link count: 5
   * Link ID: 1.1.1.2    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.4    
     Data   : 192.168.100.34  
     Link Type: P-2-P        
     Metric : 1
   * Link ID: 192.168.100.33  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.1    
     Data   : 192.168.100.53  
     Link Type: P-2-P        
     Metric : 1
   * Link ID: 192.168.100.54  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium

  Type      : Router
  Ls id     : 1.1.1.1
  Adv rtr   : 1.1.1.1  
  Ls age    : 1699 
  Len       : 96 
  Options   :  ASBR  E  
  seq#      : 80008d72 
  chksum    : 0x16fc
  Link count: 6
   * Link ID: 1.1.1.1    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 1 
     Priority : Medium
   * Link ID: 1.1.1.1    
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 12 
     Priority : Medium
   * Link ID: 1.1.1.3    
     Data   : 192.168.100.26  
     Link Type: P-2-P        
     Metric : 10
   * Link ID: 192.168.100.25  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 10 
     Priority : Medium
   * Link ID: 1.1.1.2    
     Data   : 192.168.100.54  
     Link Type: P-2-P        
     Metric : 10
   * Link ID: 192.168.100.53  
     Data   : 255.255.255.255 
     Link Type: StubNet      
     Metric : 10 
     Priority : Medium'''

results = []
group = {}
group['items'] = []
subgroup = None

for line in data.split('\n'):
    if not line.strip():
        results.append(group)
        group = {}
        group['items'] = []
        subgroup = None
    elif not line.startswith('   '):
        key, val = line.split(':')
        key = key.strip()
        val = val.strip()
        group[key] = val
    else:
        if '*' in line:
            if subgroup:
                group['items'].append(subgroup)
            subgroup = {}
        key, val = line.split(':')
        key = key.replace('*', '').strip()
        val = val.strip()
        subgroup[key] = val

group['items'].append(subgroup)            
results.append(group)

print(results)

很好的展示出来

import json    
print(json.dumps(results, indent=2))

结果:

[
  {
    "items": [
      {
        "Link ID": "1.1.1.2",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "1",
        "Priority": "Medium"
      },
      {
        "Link ID": "1.1.1.4",
        "Data": "192.168.100.34",
        "Link Type": "P-2-P",
        "Metric": "1"
      },
      {
        "Link ID": "192.168.100.33",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "1",
        "Priority": "Medium"
      },
      {
        "Link ID": "1.1.1.1",
        "Data": "192.168.100.53",
        "Link Type": "P-2-P",
        "Metric": "1"
      }
    ],
    "Type": "Router",
    "Ls id": "1.1.1.2",
    "Adv rtr": "1.1.1.2",
    "Ls age": "201",
    "Len": "84",
    "Link count": "5"
  },
  {
    "items": [
      {
        "Link ID": "1.1.1.1",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "1",
        "Priority": "Medium"
      },
      {
        "Link ID": "1.1.1.1",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "12",
        "Priority": "Medium"
      },
      {
        "Link ID": "1.1.1.3",
        "Data": "192.168.100.26",
        "Link Type": "P-2-P",
        "Metric": "10"
      },
      {
        "Link ID": "192.168.100.25",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "10",
        "Priority": "Medium"
      },
      {
        "Link ID": "1.1.1.2",
        "Data": "192.168.100.54",
        "Link Type": "P-2-P",
        "Metric": "10"
      },
      {
        "Link ID": "192.168.100.53",
        "Data": "255.255.255.255",
        "Link Type": "StubNet",
        "Metric": "10",
        "Priority": "Medium"
      }
    ],
    "Type": "Router",
    "Ls id": "1.1.1.1",
    "Adv rtr": "1.1.1.1",
    "Ls age": "1699",
    "Len": "96",
    "Options": "ASBR  E",
    "seq#": "80008d72",
    "chksum": "0x16fc",
    "Link count": "6"
  }
]

现在你有了Python结构,你可以得到你想要的。你知道吗

相关问题 更多 >

    热门问题