仅从dataframe列中的json格式获取特定键值

2条回答

网友

1楼 · 编辑于 2024-10-01 00:25:43

from ast import literal_eval

让我们从这里开始：

data=df_merge['PDH_Value'].to_dict()
data={k:v.replace('null','"null"') for k,v in data.items()} 
df_merge['PDH_Value']=pd.Series(data)

解释：

在上面的代码中，我们将df_merge['PDH_Value']转换为字符串字典（其中的值是string）
然后我们在dictionary的字符串值中将null替换为“null”，因为如果我们不这样做，我们就无法将该字符串转换为真正的dict类型
然后我们制作一系列数据，并将这些数据分配回df_merge['PDH_Value']

然后：

df_merge['PDH_Value']=df_merge['PDH_Value'].where(df_merge['PDH_Value'].str.startswith('{'),"{'catchup':'None'}")   
    
df_merge['PDH_Value']=df_merge['PDH_Value'].astype(str).map(lambda x:literal_eval(x) if x!='nan' else float('NaN'))
    
df_merge['PDH_Value']=df_merge['PDH_Value'].map(lambda x:x['catchup'])

解释：

由于df_merge['PDH_Value']的值仍然是字符串，因此我们正在检查序列中的值，即df_merge['PDH_Value']是否以{开头，或者如果它以{开头，则我们不会对其进行任何更改，但如果它没有开始，则我们将其替换为"{'catchup':'None'}"…换句话说，我们正在将空字符''替换为"{'catchup':'None'}"，因为您只对“catchup”感兴趣
在使用astype（）之后，我们将确保所有内容都是字符串，然后通过map()方法将值传递给literal_eval()，因此现在df_merge['PDH_Value']内的字符串将转换为实际的字典
因为它现在是实际的字典，所以我们通过map()方法获取'catchup'键的值

最后：

df_merge['PDH_Value']=df_merge['PDH_Value'].str.title().map({'True':'yes','False':'no','None':float('nan')})

解释：

因为我们使用的是str.title()，所以True和False是大写的T和F（第一个字母大写，其余字母小写）
因为您只需要yes和no，并且如果您确定dict中的所有值都是小写的，那么您可以从上述方法中删除str.title()，因此它变为：

df_merge['PDH_Value']=df_merge['PDH_Value'].map({'true':'yes','false':'no','None':float('nan')})

最后，我们通过map()映射值，即类似于替换，您也可以使用replace()代替map()，因此我们将true更改为yes，反之亦然，None更改为NaN

网友

2楼 · 编辑于 2024-10-01 00:25:43

我的方法类似于@Anurag Dabas，但有json.loads：

import pandas as pd
import numpy as np
import json

# load json data
# empty string and 'null' are valid json values
json_data = {
    "PDH_Value": [
        '{"roth": null, "pretax": null, "catchup": "false", "aftertax": null}', 
        '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}',
        '',
        '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}',
        '{"roth": "true", "pretax": "true", "catchup": "true", "aftertax": "true"}',
        '{"roth": "true", "pretax": "true", "aftertax": "true"}',
        'null'
    ]}
df_merge = pd.DataFrame(json_data)

# replace empty strings or whitespace strings with NaN...
df_merge['PDH_Value'] = df_merge['PDH_Value'].replace(r'^\s*$', np.nan, regex=True)

# replace NaN-s with valid JSON with null value "catchup"
df_merge['PDH_Value'] = df_merge['PDH_Value'].fillna('{"catchup": null}')

# parse json values in the columns
df_merge['PDH_Value'] = df_merge['PDH_Value'].apply(json.loads)

# select only "catchup" property from the json if `x` is the dict and has `catchup` property 
df_merge['PDH_Value'] = df_merge['PDH_Value'].apply(lambda x: x['catchup'] if type(x) == dict and 'catchup' in x else None)

print(df_merge)

>>>          PDH_Value
>>>   0      false
>>>   1      true
>>>   2      None
>>>   3      true
>>>   4      true
>>>   5      None
>>>   6      None

相关问题更多 >

编程相关推荐

热门问题

热门文章

仅从dataframe列中的json格式获取特定键值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >