Python从Json字符串中提取元素

2024-06-30 15:27:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Json字符串,我可以从中提取一些组件,比如formatted_addresslatlng,但我无法提取其他组件的特征(值),比如交集、政治、国家、行政区\u级别1,行政区级别2、行政级别3、行政区域级别4、行政区域级别5、口语区、地点、病房、邻居,前提、次优惠等long_name下 我希望数据表

formatted_address              px_val      py_val      political      country      administrative_area_level_1   ..  ..                 
Satya Niwas, Kanti Nagar..   19.1096591  72.8674712 Kanti Nagar,JB Nagar India   maharashtra   ..  ..                                                                 
82, Bamanpuri, Ajit Nagar..  19.109749   72.867249    Bamanpuri    India maharashtra   .. .. 
    .
    . 
    .

下面是示例JSON字符串

^{pr2}$

这是一段代码

    import json
    import pandas as pd
    line="json_str"
    json_st = json.loads(line)
    country=[]
    political=[]
    address_fields = { 
    'intersection': [],        
    'political': [],        
    'country': []
}

for json_str in json_st:
    address_fields = {

        'intersection': [],        
        'political': [],        
        'country': []
    }
    if isinstance(json_st,dict): 
         first_address_components = json_st['results']
         #format_add = json_st['results'][0]
    else:
         first_address_components = json_st[0]['address_components']
    for item in first_address_components:

        for field_key in address_fields.keys():
                #address_fields[field_key].append( str(format_add['formatted_address']))
              if field_key in item['types']:
               address_fields[field_key].append(item['long_name'])

    address_fields = {key: ', '.join(values) for key, values in address_fields.items()}
    country.append(address_fields['country'])   
    political.append(address_fields['political'])

它会出错

json_st['results']['address_components']
Traceback (most recent call last):

  File "<ipython-input-94-315fa8711f9d>", line 1, in <module>
    json_st['results']['address_components']

TypeError: list indices must be integers or slices, not str

我得到了预期O/p的前3列,但无法提取其他列。 关于这个问题的任何建议都会有帮助

谢谢

多姆尼克


Tags: keyinjsonfieldfieldsforaddresscomponents
3条回答

这是一个相当宽泛的问题。。。在

要帮助您开始:

record_path = ['address_components']

meta= [
  'formatted_address',
  ['geometry','location','lat'],
  ['geometry','location','lng'],  
]

x = pd.io.json.json_normalize(d['results'], record_path, meta)

结果:

^{pr2}$

您需要了解数据的模式。在

json\u st['results']['address_components']

因为json\u st['results']是一个数组

在这里检查一下http://jsoneditoronline.org

这是一些样品

for result in data['results']:
    print type(result)
    for address_component in result['address_components']:
        print type(address_component)
        print address_component['long_name']
        print address_component['short_name']
        for _type in address_component['types']:
            print _type

我会选择json_normalize,想到一行的答案,但我不认为这是可能的,即(这里我只为px_val和py_val做过,你可以对其他列做类似的事情)

from pandas.io.json import json_normalize

import pandas as pd
import json

with open('dat.json') as f:
    data = json.load(f)

result = json_normalize(data,'results')

result['px_val'] = result['geometry'].apply(json_normalize).apply(lambda x : x['location.lat'])
result['py_val'] = result['geometry'].apply(json_normalize).apply(lambda x : x['location.lng'])

print(result[['formatted_address','px_val','py_val']])
^{pr2}$

我试图从政治角度来分析这个解决方案,当然不是很自豪

^{3}$

输出result['political']

0    Kanti Nagar,J B Nagar,Andheri East,Mumbai,Mumb...
1    Bamanpuri,Ajit Nagar,J B Nagar,Andheri East,Mu...
2    Kanti Nagar,J B Nagar,Andheri East,Mumbai,Mumb...
3    Bamanpuri,Kanti Nagar,J B Nagar,Andheri East,M...
4    Bamanpuri,J.B. Nagar,J B Nagar,Andheri East,Mu...
5    Bamanpuri,Ajit Nagar,J B Nagar,Andheri East,Mu...
6    Ajit Nagar,J B Nagar,Andheri East,Mumbai,Mumba...
7    Bamanpuri,J B Nagar,Andheri East,Mumbai,Mumbai...
8    J B Nagar,Andheri East,Mumbai,Mumbai Suburban,...
9    Andheri East,Mumbai,Mumbai Suburban,Maharashtr...
Name: political, dtype: object

把它转换成我们能做的方法

def get_cols(st):
    pol = []
    for i in result['address_components'].apply(json_normalize):
         pol.append(','.join(i.apply(lambda x : x['long_name'] if st in x['types'] else np.nan,1).dropna()))

   return  pol

result['political'] = get_cols('political') 
# This will assign the new column political with data. 

相关问题 更多 >