如何在pandas dataframe中从文本字段提取数据？

df=pd.DataFrame([ [43,{"tags":["webcom","start","temp","webcomfoto","dance"],"image":["https://image.com/Kqk.jpg"]}], [83,{"tags":["yourself","start",""],"image":["https://images.com/test.jpg"]}], [76,{"tags":["en","webcom"],"links":["http://webcom.webcomdb.com","http://webcom.webcomstats.com"],"users":["otole"]}], [77,{"tags":["webcomznakomstvo","webcomzhiznx","webcomistoriya","webcomosebe","webcomfotografiya"],"image":["https://images.com/nt4wzguoh/y_a3d735b4.jpg","https://images.com/sucb0u24x/b1sd_Naju.jpg"]}], [81,{"tags":["webcomfotografiya"],"users":["myself","boattva"],"links":["https://webcom.com/nk"]}], ],columns=["_id","tags"])

3条回答

网友

1楼 · 编辑于 2024-10-03 11:13:48

您可以使用str访问器来获取字典键，并使用value_counts获取{}：

df.tags.str['tags'].str.len().value_counts()\
  .rename('Posts')\
  .rename_axis('Tags')\
  .reset_index()

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-10-03 11:13:48

坚持collections.Counter，有一种方法：

from collections import Counter
from operator import itemgetter

c = Counter(map(len, map(itemgetter('tags'), df['tags'])))

res = pd.DataFrame.from_dict(c, orient='index').reset_index()
res.columns = ['Tags', 'Posts']

print(res)

   Tags  Posts
0     5      2
1     3      1
2     2      1
3     1      1

网友

3楼 · 编辑于 2024-10-03 11:13:48

列tags中的数据是strings，不是dictionaries，有问题。在

所以需要第一步：

import ast

df['tags'] = df['tags'].apply(ast.literal_eval)

然后应用原始答案，如果有多个字段，效果非常好。在

正在验证：

^{pr2}$

#convert column to string for verify solution
df['tags'] = df['tags'].astype(str)

print (df['tags'].apply(type))
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
4    <class 'str'>
Name: tags, dtype: object

#convert back
df['tags'] = df['tags'].apply(ast.literal_eval)

print (df['tags'].apply(type))
0    <class 'dict'>
1    <class 'dict'>
2    <class 'dict'>
3    <class 'dict'>
4    <class 'dict'>
Name: tags, dtype: object

c = Counter([len(x['tags']) for x in df['tags']])

df = pd.DataFrame({'Number of posts':list(c.values()), ' Number of tags ': list(c.keys())})
print (df)
   Number of posts   Number of tags 
0                1                 0
1                1                 3
2                1                 2
3                1                 5
4                1                 1

相关问题更多 >

编程相关推荐

热门问题

热门文章