使用JSON键创建Dataframe

2024-10-02 08:19:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个由YouTube的iframeapi生成的JSON文件,我想把这个JSON数据放到一个dataframe中,每个JSON键都是一列,每个记录都应该是一个新行。你知道吗

通常我会使用循环并迭代JSON的行,但是这个特定的JSON看起来是这样的:

[  
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]

在这个JSON中,并不是每一个键都写为新行。在这种情况下,如何提取键并将它们表示为列?你知道吗


Tags: jsondatetimelargeplayingvideoidstartsecondtimemillis
3条回答

最简单的方法是从pandas利用json_normalize。你知道吗

import json
from pandas.io.json import json_normalize

input_dict = [
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]
input_json = [json.loads(j) for j in input_dict]

df = json_normalize(input_json)

我认为您要求分解键和值,并希望键作为列,值作为行:
这是我的方法和plz总是提供您的预期输出应该喜欢

ChainMap将dict的键和值扁平化,几乎是不言自明的。你知道吗

data = ["{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}","{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"]

import json
from collections import ChainMap

data = [json.loads(i) for i in data]
data = dict(ChainMap(*data))

keys = []
vals = []

for k,v in data.items():
    keys.append(k)
    vals.append(v)

data = pd.DataFrame(zip(keys,vals)).T
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header


#startSecond    playbackRates   playbackRate    qual    totalTimeFormatted  timemillis  playerStateNumeric  playerStateVerbose  playerErrorNumeric  date    time    stopSecond  bufferLevelPercent  playerErrorVerbose  qualLevels  videoId curTimeFormatted    playoutLevelPercent
  #0             [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]   1   large   9:46    1563467467703   1   Playing     18.7.2019   18:31:07,703    90  1.4     [hd720, large, medium, small, tiny, auto]   0HJx2JhQKQk 0:02    0.3

Pythonic解决方案是使用Python字典的键和值API。你知道吗

应该是这样的:

ls = [
   "{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
   "{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"

]
ls = [json.loads(j) for j in ls]

keys = [j.keys() for j in ls] # this will get you all the keys
vals = [j.values() for j in ls] # this will get the values and then you can do something with it 

print(keys)
print(values)

相关问题 更多 >

    热门问题