高速填充数据帧

data.individus.head(5) Out[25]: 0 [{'dateDeNaissance': 1954-09-14 00:00:00, 'enc... 1 [{'dateDeNaissance': 1984-09-14 00:00:00, 'enc... 2 [{'enceinte': False, 'dateDeNaissance': 1981-0... 3 [{'dateDeNaissance': 1989-09-14 00:00:00, 'enc... 4 [{'enceinte': False, 'dateDeNaissance': 1989-0... Name: individus, dtype: object

t_individus.ix[:, ['dateDeNaissance', 'enceinte']].head() Out[14]: dateDeNaissance enceinte 0 1954-09-14 00:00:00 False 1 1984-09-14 00:00:00 False 2 1981-09-14 00:00:00 False 3 1989-09-14 00:00:00 False 4 1989-09-14 00:00:00 False

serie = data.foo # 110199 lines keys = get_all_possible_keys(serie) # 48 keys (process time: 0.8s) table = pd.DataFrame(columns=list(keys)) for i in serie: df = pd.DataFrame(list(i.items())) df = df.transpose() df.columns = df.iloc[0] df = df.reindex(df.index.drop(0)) table = pd.concat([table, df], axis=0)

serie = data.foo keys = get_all_possible_keys(serie) len_serie = len(serie) # -- Pre-allocate memory by declaring size table = pd.DataFrame(np.nan, index=range(0, len_serie), columns=list(keys)) # -- Fill row by row k = 0 for i in serie: table.loc[k] = pd.Series(i[0]) k += 1

2条回答

网友
1楼 · 编辑于 2024-05-03 04:35:22

这与@James的想法几乎相同，但在您的例子中，您有一系列的dict列表，您希望首先将其转换为dict列表或dict序列：
In [12]: s Out[12]: 0 [{'a': 'aaa', 'b': 'bbb', 'c': 'ccc'}] 1 [{'a': 'a1', 'b': 'b1', 'c': 'c1'}] dtype: object In [13]: pd.DataFrame(s.sum()) Out[13]: a b c 0 aaa bbb ccc 1 a1 b1 c1 In [14]: s.sum() Out[14]: [{'a': 'aaa', 'b': 'bbb', 'c': 'ccc'}, {'a': 'a1', 'b': 'b1', 'c': 'c1'}]
使用.tolist()：
In [15]: pd.DataFrame(s.tolist()) Out[15]: 0 0 {'a': 'aaa', 'b': 'bbb', 'c': 'ccc'} 1 {'a': 'a1', 'b': 'b1', 'c': 'c1'}

网友
2楼 · 编辑于 2024-05-03 04:35:22

我在过去发现，从dict列表构建数据帧的速度惊人地快。我的简单建议是
dataframe = pandas.DataFrame(data.foo.tolist())

相关问题更多 >

编程相关推荐

热门问题

热门文章