提前感谢您的帮助
我的python代码读取json输入文件并将数据加载到数据框中,屏蔽或更改配置指定的数据框列,并在最后阶段创建json输出文件
read json into data frame --> mask/change the df column ---> generate json
输入json:
[
{
"BinLogFilename": "mysql.log",
"Type": "UPDATE",
"Table": "users",
"ServerId": 1,
"BinLogPosition": 2111
}, {
{ "BinLogFilename": "mysql.log",
"Type": "UPDATE",
"Table": "users",
"ServerId": null,
"BinLogPosition": 2111
},
...
]
当我将上述json加载到数据帧中时,数据帧列“ServerId”具有浮点值,因为它在几个json输入块中具有null
主中央逻辑将/伪造“ServerId”转换为另一个数字,但输出包含浮点数
输出json:
[
{
"BinLogFilename": "mysql.log",
"Type": "UPDATE",
"Table": "users",
"ServerId": 5627.0,
"BinLogPosition": 2111
},
{
"BinLogFilename": "mysql.log",
"Type": "UPDATE",
"Table": "users",
"ServerId": null,
"BinLogPosition": 2111
},
....
]
掩蔽逻辑
df['ServerId'] = [fake.pyint() if not(pd.isna(df['ServerId'][index])) else np.nan for index in range(len(df['ServerId']))]
问题是,输出“ServerId”应该只包含整数,但不幸的是它包含浮点
df['ServerId']
0 9590.0
1 NaN
2 1779.0
3 1303.0
4 NaN
我找到了这个问题的答案,使用“Int64”
df['ServerId'] = df['ServerId'].astype('Int64')
0 8920
1 <NA>
2 9148
3 2434
4 <NA>
然而,使用“Int64”,它将NaN转换为NA,在写回json时,我得到一个错误,如下所示:
TypeError: Object of type NAType is not JSON serializable
with gzip.open(outputFile, 'w') as outfile:
outfile.write(json.dumps(json_objects_list).encode('utf-8'))
转换为“Int64”数据类型后是否可以保留NaN?如果不可能,我如何修复错误
目前没有回答
相关问题 更多 >
编程相关推荐