<p>首先<code>str.split</code>和<code>explode</code>列gems和<code>reset_index</code>保留原始索引。然后,对于df2的每一列,<code>merge</code>和分解的gem,<code>groupby</code>原始索引,并根据需要执行<code>count</code>和聚合
与<code>join</code>一起<code>pd.concat</code>合并每个列并连接到原始df1<code>fillna</code>如预期输出中所示,计数列为0</p>
<pre><code># one row per gem used in the merge
df_ = df1['gems'].str.split(', ').explode().reset_index()
res = (
df1.join( #can join to df1 as we keep the original index value
pd.concat([df_.merge(df2[[col]], left_on='gems', right_on=col)
.groupby('index') # original index in df1
[col].agg(**{col: 'count', # do each aggregation
f'{col}Group':lambda x: ', '.join(x)})
for col in df2.columns], # do it for each column of df2
axis=1))
.fillna({col:0 for col in df2.columns}) #fill the count columns with 0
)
print(res)
values gems 1C 1CGroup 02C 02CGroup 34C 34CGroup
0 Cricket A1K, A2M, JA3, AN4 1.0 A1K 0.0 NaN 0.0 NaN
1 Soccer B1, A1, Bn2, B3 1.0 B1 2.0 Bn2, B3 1.0 B3
2 Football CD1, A1 0.0 NaN 0.0 NaN 1.0 CD1
3 Tennis KWS, KQM 0.0 NaN 0.0 NaN 0.0 NaN
4 Badminton JP, CVK 0.0 NaN 0.0 NaN 0.0 NaN
5 Chess KF, GF 0.0 NaN 0.0 NaN 1.0 KF
</code></pre>