将JSON映射到DataFrame列中的列表中

2024-10-01 13:33:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试转换整数列表中的字符串列表,该列表将其ID关联到数据帧列中

这是因为我需要像下一个节目一样,按id绘制一个运动列表。有些运动不在JSON中。在这种情况下,有必要删除所需dataframe with integer list列中的该元素

这就是我必须映射的JSON:

[ 
   {
     "id": 1,
     "name": "Karate",
   }, 
   {
      "id": 2,
      "name": "Paintball",
   },
   {
      "id": 3,
      "name": "Rugby",
   },
   {
      "id": 4,
      "name": "Squash",
   },
   {
      "id": 5,
      "name": "Softball",
   },
   {
      "id": 6,
      "name": "Swimiming",
   },
   {
      "id": 7,
      "name": "Weighlifting",
   },
   {
      "id": 8,
      "name": "Table Tennis",
   },
   {
      "id": 9,
      "name": "Tenpin Bowling",
   }
]

这就是我拥有的数据框架,其中包含JSON中没有的运动

id        sports             
111       ['Softball', 'Table Tennis', 'Rafting']                     
222       ['Rugby', 'Tenpin Bowling','Squash'] 
333       ['Weighlifting', 'Tennis', 'Swimiming'] 
444       ['Softball', 'Table Tennis', 'Paintball']
555       ['Rugby', 'Tenpin Bowling','Squash']
666       ['Weighlifting', 'Karate', 'Swimiming']
777       ['Softball', 'Table Tennis', 'Soccer'] 
888       ['Basketball', 'Tenpin Bowling','Squash']
999       ['Weighlifting', 'Karate', 'Swimiming']

这就是我需要的数据帧,没有无法在JSON中映射的运动

id        sports             
111       [5, 8]                     
222       [3, 9, 4] 
333       [7, 6] 
444       [5, 8, 2]
555       [3, 9, 4]
666       [7, 1, 6] 
777       [5, 8] 
888       [9, 4]
999       [7, 1, 6]

有解决办法吗

提前谢谢


Tags: 数据nameidjson列表tablesquashbowling
3条回答

这与你之前的问题类似。我修改了我以前的答案来处理这个案例和NaN和非列表元素。让我们将json字符串称为l_str

df_map = pd.read_json(l_str)
d = dict(zip(df_map.name, df_map.id))
df['sports'] = [[d.get(y) for y in x if y in d] 
                       for x in df.sports if isinstance(x, list)]

Out[51]:
    id     sports
0  111     [5, 8]
1  222  [3, 9, 4]
2  333     [7, 6]
3  444  [5, 8, 2]
4  555  [3, 9, 4]
5  666  [7, 1, 6]
6  777     [5, 8]
7  888     [9, 4]
8  999  [7, 1, 6]
  • 如果带有运动代码的DICT列表位于文件test.json中,请将其加载到data
    • 如果已经加载了dict列表,那么只需跳过加载文件部分,并用正在使用的变量名替换data
  • 这个答案假设sports列中的值是列表,而不是字符串
    • 如果sports列内容是字符串,则使用df.sports = df.sports.apply(literal_eval)
  • 如果要用代码替换sports列,请使用df['sports'] = 而不是df['codes'] =
from ast import literal_eval
import pandas as pd


# if the list of dicts is in a file, load it with the following
with open('test.json', 'r') as f:
    data = literal_eval(f.read())

# data is the object now holding the list of dicts
# convert data to a dict
dd = {d['name']: d['id'] for d in data}

# add a codes column for the sports in dd
df['codes'] = df.sports.apply(lambda x: [dd.get(v) for v in x if v in dd])

# display df
    id                                sports      codes
0  111     [Softball, Table Tennis, Rafting]     [5, 8]
1  222       [Rugby, Tenpin Bowling, Squash]  [3, 9, 4]
2  333     [Weighlifting, Tennis, Swimiming]     [7, 6]
3  444   [Softball, Table Tennis, Paintball]  [5, 8, 2]
4  555       [Rugby, Tenpin Bowling, Squash]  [3, 9, 4]
5  666     [Weighlifting, Karate, Swimiming]  [7, 1, 6]
6  777      [Softball, Table Tennis, Soccer]     [5, 8]
7  888  [Basketball, Tenpin Bowling, Squash]     [9, 4]
8  999     [Weighlifting, Karate, Swimiming]  [7, 1, 6]

首先从jsondata中初始化数据帧并使用^{}^{}从jsondata中创建mappings字典,然后使用该mappings字典将列表中的每个运动映射到相应的id

mappings = pd.read_json(data).set_index('name')['id'].to_dict()
df['sports'] = [[mappings[key] for key in lst if key in mappings] for lst in df['sports']]

或者,也可以将^{}^{}一起使用,但这种方法通常速度较慢:

mappings = pd.read_json(data).set_index('name')['id']
df['sports'] = (
    df['sports'].explode()
    .map(mappings).dropna().astype(int).groupby(level=0).agg(list)
)

结果:

# print(df)
    id     sports
0  111     [5, 8]
1  222  [3, 9, 4]
2  333     [7, 6]
3  444  [5, 8, 2]
4  555  [3, 9, 4]
5  666  [7, 1, 6]
6  777     [5, 8]
7  888     [9, 4]
8  999  [7, 1, 6]

相关问题 更多 >