将JSON映射到DataFrame列中的列表中

[ { "id": 1, "name": "Karate", }, { "id": 2, "name": "Paintball", }, { "id": 3, "name": "Rugby", }, { "id": 4, "name": "Squash", }, { "id": 5, "name": "Softball", }, { "id": 6, "name": "Swimiming", }, { "id": 7, "name": "Weighlifting", }, { "id": 8, "name": "Table Tennis", }, { "id": 9, "name": "Tenpin Bowling", } ]

id sports 111 ['Softball', 'Table Tennis', 'Rafting'] 222 ['Rugby', 'Tenpin Bowling','Squash'] 333 ['Weighlifting', 'Tennis', 'Swimiming'] 444 ['Softball', 'Table Tennis', 'Paintball'] 555 ['Rugby', 'Tenpin Bowling','Squash'] 666 ['Weighlifting', 'Karate', 'Swimiming'] 777 ['Softball', 'Table Tennis', 'Soccer'] 888 ['Basketball', 'Tenpin Bowling','Squash'] 999 ['Weighlifting', 'Karate', 'Swimiming']

id sports 111 [5, 8] 222 [3, 9, 4] 333 [7, 6] 444 [5, 8, 2] 555 [3, 9, 4] 666 [7, 1, 6] 777 [5, 8] 888 [9, 4] 999 [7, 1, 6]

3条回答

网友

1楼 · 编辑于 2024-10-01 13:33:28

这与你之前的问题类似。我修改了我以前的答案来处理这个案例和NaN和非列表元素。让我们将json字符串称为l_str

df_map = pd.read_json(l_str)
d = dict(zip(df_map.name, df_map.id))
df['sports'] = [[d.get(y) for y in x if y in d] 
                       for x in df.sports if isinstance(x, list)]

Out[51]:
    id     sports
0  111     [5, 8]
1  222  [3, 9, 4]
2  333     [7, 6]
3  444  [5, 8, 2]
4  555  [3, 9, 4]
5  666  [7, 1, 6]
6  777     [5, 8]
7  888     [9, 4]
8  999  [7, 1, 6]

网友

2楼 · 编辑于 2024-10-01 13:33:28

如果带有运动代码的DICT列表位于文件test.json中，请将其加载到data
- 如果已经加载了dict列表，那么只需跳过加载文件部分，并用正在使用的变量名替换data
这个答案假设sports列中的值是列表，而不是字符串
- 如果sports列内容是字符串，则使用df.sports = df.sports.apply(literal_eval)
如果要用代码替换sports列，请使用df['sports'] = 而不是df['codes'] =

from ast import literal_eval
import pandas as pd


# if the list of dicts is in a file, load it with the following
with open('test.json', 'r') as f:
    data = literal_eval(f.read())

# data is the object now holding the list of dicts
# convert data to a dict
dd = {d['name']: d['id'] for d in data}

# add a codes column for the sports in dd
df['codes'] = df.sports.apply(lambda x: [dd.get(v) for v in x if v in dd])

# display df
    id                                sports      codes
0  111     [Softball, Table Tennis, Rafting]     [5, 8]
1  222       [Rugby, Tenpin Bowling, Squash]  [3, 9, 4]
2  333     [Weighlifting, Tennis, Swimiming]     [7, 6]
3  444   [Softball, Table Tennis, Paintball]  [5, 8, 2]
4  555       [Rugby, Tenpin Bowling, Squash]  [3, 9, 4]
5  666     [Weighlifting, Karate, Swimiming]  [7, 1, 6]
6  777      [Softball, Table Tennis, Soccer]     [5, 8]
7  888  [Basketball, Tenpin Bowling, Squash]     [9, 4]
8  999     [Weighlifting, Karate, Swimiming]  [7, 1, 6]

网友

3楼 · 编辑于 2024-10-01 13:33:28

首先从jsondata中初始化数据帧并使用^{}和^{}从jsondata中创建mappings字典，然后使用该mappings字典将列表中的每个运动映射到相应的id：

mappings = pd.read_json(data).set_index('name')['id'].to_dict()
df['sports'] = [[mappings[key] for key in lst if key in mappings] for lst in df['sports']]

或者，也可以将^{}与^{}一起使用，但这种方法通常速度较慢：

mappings = pd.read_json(data).set_index('name')['id']
df['sports'] = (
    df['sports'].explode()
    .map(mappings).dropna().astype(int).groupby(level=0).agg(list)
)

结果:

# print(df)
    id     sports
0  111     [5, 8]
1  222  [3, 9, 4]
2  333     [7, 6]
3  444  [5, 8, 2]
4  555  [3, 9, 4]
5  666  [7, 1, 6]
6  777     [5, 8]
7  888     [9, 4]
8  999  [7, 1, 6]

相关问题更多 >

编程相关推荐

热门问题

热门文章