将dictlike数据导入pandas

2024-09-27 00:18:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有许多数据文件是以类似dict的格式编写的:

{"score": [0.9995803236961365, 0.00041968212462961674], "key": "Am2mVTMbhd0y", "label": "0"}
{"score": [0.9997120499610901, 0.00028794570243917406], "key": "AmG8StB8hM2k", "label": "0"}
{"score": [0.8841496109962463, 0.11585044860839844], "key": "Alt137zv2nY6", "label": "0"}
{"score": [0.9999467134475708, 5.334055458661169e-05], "key": "AmGdF7cY4X22", "label": "0"}

我想做的是将它们导入pandas,列为'key'、'label'和'score'——并且必须将这两个数值放在不同的列中。我尝试将文件作为dict导入,但得到:

ValueError: too many values to unpack

有什么建议可以解决这个问题吗


Tags: 文件keypandas数据文件格式labeldict数值
2条回答
import pandas as pd

#add your data in a list
data = [{"score": [0.9995803236961365, 0.00041968212462961674], "key": "Am2mVTMbhd0y", "label": "0"},
{"score": [0.9997120499610901, 0.00028794570243917406], "key": "AmG8StB8hM2k", "label": "0"},
{"score": [0.8841496109962463, 0.11585044860839844], "key": "Alt137zv2nY6", "label": "0"},
{"score": [0.9999467134475708, 5.334055458661169e-05], "key": "AmGdF7cY4X22", "label": "0"}]
#create dataframe
df = pd.DataFrame(data)

我想你需要参数lines=True^{}

df = pd.read_json('file.json', lines=True)
print (df)
            key  label                                         score
0  Am2mVTMbhd0y      0   [0.999580323696136, 0.00041968212462900004]
1  AmG8StB8hM2k      0  [0.9997120499610901, 0.00028794570243900004]
2  Alt137zv2nY6      0     [0.8841496109962461, 0.11585044860839801]
3  AmGdF7cY4X22      0    [0.99994671344757, 5.3340554586611695e-05]

print (type(df['score'].iat[0]))
<class 'list'>

要将lists转换为列,请使用带有^{}DataFrame构造函数:

df = pd.concat([df.drop('score', 1),
                pd.DataFrame(df['score'].values.tolist()).add_prefix('score')], axis=1)
print (df)
            key  label    score0    score1
0  Am2mVTMbhd0y      0  0.999580  0.000420
1  AmG8StB8hM2k      0  0.999712  0.000288
2  Alt137zv2nY6      0  0.884150  0.115850
3  AmGdF7cY4X22      0  0.999947  0.000053

相关问题 更多 >

    热门问题