<p>只是一个想法的大致轮廓</p>
<p>假设您有这样一个数据帧:</p>
<pre><code>recipe_id | parsed_ingredients
1 | [{...}, {...}, ...]
2 | [{...}, {...}, ...]
3 | [{...}, {...}, ...]
</code></pre>
<p>使用<code>explode</code>方法,展开DataFrame以显示每行一个成分字典</p>
<pre><code>df = df.explode('parsed_ingredients')
df.head()
recipe_id | parsed_ingredients
1 | {...}
1 | {...}
...
2 | {...}
2 | {...}
...
3 | {...}
3 | {...}
...
</code></pre>
<p>现在从每个字典中提取<code>matched_ingredient_st</code></p>
<pre><code>df['matched_ingredient_st'] = df['parsed_ingredients'].apply(lambda x: x['matched_ingredient_st'])
df['match'] = 1 # Added for the next step
df.head()
recipe_id | parsed_ingredients | matched_ingredient_st | match
1 | {...} | ingredient_a | 1
1 | {...} | ingredient_b | 1
...
2 | {...} | ingredient_b | 1
2 | {...} | ingredient_d | 1
...
3 | {...} | ingredient_c | 1
3 | {...} | ingredient_d | 1
...
</code></pre>
<p>现在,您可以使用内置的pivot方法在原始数据集中将数据帧还原为类似的格式</p>
<pre><code>df = df.pivot(index='recipe_id ', columns='matched_ingredient_st ', values='match')
df.head()
| ingredient_a | ingredient_b | ingredient_c | ingredient_d
-
1 | 1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 | 1 |
3 | 0 | 0 | 1 | 1 |
</code></pre>
<p>实际上,我们还没有在Python中运行这个程序,但是有逻辑和方法</p>