函数，而不必返回值？/通过df快速循环？

[{'orig_name': '1,00 kg Kalbsbraten ', 'orig_amount': '1.00', 'orig_unit': 'kg', 'amount': 0.25, 'unit': 'g', 'splitted_ingredient': 'Kalbsbraten', 'splitted_slized_ingredient': 'Kalbsbraten', 'further_specification': '', 'alternatives': '', 'matched_ingredient_id': 'U030100', 'matched_ingredient_st': 'Kalb Hackfleisch roh', 'calorie': 148, 'protein': 19.726, 'carb': 0.0, 'fat': 7.713}, {'orig_name': '1,00 Zwiebel(n) ', 'orig_amount': '1.00', 'orig_unit': 'Anzahl', 'amount': 9.0, 'unit': 'g', 'splitted_ingredient': 'Zwiebel(n)', ... ]

for index, row in df.iterrows(): extracted_ingredient = "" for ingredient in row["parsed_ingredients"]: extracted_ingredient = ingredient["matched_ingredient_st"] if not extracted_ingredient == "None": df.loc[index, extracted_ingredient] = 1

def ingredient_extraction(content, dataframe=df): for newrow in content: for entry in newrow: if not entry["matched_ingredient_st"] == "None": df[entry["matched_ingredient_st"]] = 1 df.apply(ingredient_extraction(df["parsed_ingredients"], df), axis=1)

2条回答

网友

1楼 · 编辑于 2024-07-04 17:13:35

只是一个想法的大致轮廓

假设您有这样一个数据帧：

recipe_id | parsed_ingredients
               
 1        | [{...}, {...}, ...]
 2        | [{...}, {...}, ...]
 3        | [{...}, {...}, ...]

使用explode方法，展开DataFrame以显示每行一个成分字典

df = df.explode('parsed_ingredients')
df.head()

recipe_id | parsed_ingredients
               
 1        | {...}
 1        | {...}
     ...
 2        | {...}
 2        | {...}
     ...
 3        | {...}
 3        | {...}
     ...

现在从每个字典中提取matched_ingredient_st

df['matched_ingredient_st'] = df['parsed_ingredients'].apply(lambda x: x['matched_ingredient_st'])
df['match'] = 1 # Added for the next step
df.head()

recipe_id | parsed_ingredients | matched_ingredient_st | match
                               
 1        | {...}              | ingredient_a          | 1
 1        | {...}              | ingredient_b          | 1
     ...
 2        | {...}              | ingredient_b          | 1
 2        | {...}              | ingredient_d          | 1
     ...
 3        | {...}              | ingredient_c          | 1
 3        | {...}              | ingredient_d          | 1
     ...

现在，您可以使用内置的pivot方法在原始数据集中将数据帧还原为类似的格式

df = df.pivot(index='recipe_id ', columns='matched_ingredient_st ', values='match')
df.head()

   | ingredient_a | ingredient_b | ingredient_c | ingredient_d 
                               -
 1 |       1      |       1      |       0      |       0      |
 2 |       0      |       1      |       0      |       1      |
 3 |       0      |       0      |       1      |       1      |

实际上，我们还没有在Python中运行这个程序，但是有逻辑和方法

网友

2楼 · 编辑于 2024-07-04 17:13:35

首先，您可以生成字典列表的所有值，然后传递给DataFrame构造函数，最后一次连接到原始：

L = [[{'matched_ingredient_id': 'U030100',
  'matched_ingredient_st': 'Kalb',
},
 {  'matched_ingredient_id': 'U030100',
  'matched_ingredient_st': 'Ka',
  'splitted_ingredient': 'Zwiebel(n)'}],[
  {'matched_ingredient_id': 'U030100',
  'matched_ingredient_st': 'roh',
},
 {  'matched_ingredient_id': 'U030100',
  'matched_ingredient_st': 'K',
  'splitted_ingredient': 'Zwiebel'}
]]

df = pd.DataFrame({'parsed_ingredients':L})

L = [{y['matched_ingredient_st']:1 for y in x if not y["matched_ingredient_st"] == "None"}
      for x in df['parsed_ingredients']]


df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int)
print (df1)
   Kalb  Ka  roh  K
0     1   1    0  0
1     0   0    1  1

df = df.join(df1)
print(df)
                                  parsed_ingredients  Kalb  Ka  roh  K
0  [{'matched_ingredient_id': 'U030100', 'matched...     1   1    0  0
1  [{'matched_ingredient_id': 'U030100', 'matched...     0   0    1  1

相关问题更多 >

编程相关推荐

热门问题

热门文章