从pandas数据帧列中的字典列表中获取第一个值

id photos 001 [{'medium':'https:blablabla1', 'xl':'something1', 's':'anotherthing1'}, {'medium':'https:blablabla2', 'xl':'something2', 's':'anotherthing2'}, {'medium':'https:blablabla3', 'xl':'something3', 's':'anotherthing3'}] 002 [{'medium':'https:blablabla4', 'xl':'something4', 's':'anotherthing4'}, {'medium':'https:blablabla5', 'xl':'something5', 's':'anotherthing5'}, {'medium':'https:blablabla6', 'xl':'something6', 's':'anotherthing6'}] 003 [{'medium':'https:blablabla7', 'xl':'something7', 's':'anotherthing7'}, {'medium':'https:blablabla8', 'xl':'something8', 's':'anotherthing8'}, {'medium':'https:blablabla9', 'xl':'something9', 's':'anotherthing9'}]

dicts_list = [{'medium':'https:blablabla1', 'xl':'something1', 's':'anotherthing1'}, {'medium':'https:blablabla2', 'xl':'something2', 's':'anotherthing2'}, {'medium':'https:blablabla3', 'xl':'something3', 's':'anotherthing3'}] # Access the first value of the first dict in a list list(dicts_list[0].values())[0] #output 'https:blablabla1'

2条回答

网友

1楼 · 编辑于 2024-06-28 00:15:43

可以对每行使用apply函数，如下所示：

df['image_url'] = df.apply(lambda row: row.photos[0]['medium'], axis=1)

输出：

^{pr2}$

现在，如果您不喜欢photos列，可以直接删除它。。。在

网友

2楼 · 编辑于 2024-06-28 00:15:43

这是一种方法。如果您的列或Series是如下所示的dict列表：

>>> import pandas as pd
>>> s = pd.Series([[{'medium':'https:blablabla1',
...   'xl':'something1',
...   's':'anotherthing1'},
... {'medium':'https:blablabla2',
...   'xl':'something2',
...   's':'anotherthing2'},
... {'medium':'https:blablabla3',
...   'xl':'something3',
...   's':'anotherthing3'}],
... [{'medium':'https:blablabla4',
...   'xl':'something4',
...   's':'anotherthing4'},
... {'medium':'https:blablabla5',
...   'xl':'something5',
...   's':'anotherthing5'},
... {'medium':'https:blablabla6',
...   'xl':'something6',
...   's':'anotherthing6'}],
... [{'medium':'https:blablabla7',
...   'xl':'something7',
...   's':'anotherthing7'},
... {'medium':'https:blablabla8',
...   'xl':'something8',
...   's':'anotherthing8'},
... {'medium':'https:blablabla9',
...   'xl':'something9',
...   's':'anotherthing9'}]])
>>> s
0    [{'medium': 'https:blablabla1', 'xl': 'somethi...
1    [{'medium': 'https:blablabla4', 'xl': 'somethi...
2    [{'medium': 'https:blablabla7', 'xl': 'somethi...
dtype: object
>>> s.apply(pd.Series)[0].apply(pd.Series).medium
0    https:blablabla1
1    https:blablabla4
2    https:blablabla7
Name: medium, dtype: object

不确定是否有更优雅的解决方案。但希望这有帮助！在

编辑

作为补充说明，我知道在pandas社区中大量使用{}是不受欢迎的。尤其是如果你有非常大的DataFrames。。。您将看到一些性能问题。在

我真的想不出一个vectorized的解决方案。但如果你的数据集不是太大，我想这应该能解决问题。在

相关问题更多 >

编程相关推荐

热门问题

热门文章