擅长:python、mysql、java
<p><strong>使用行的笛卡尔乘积并检查每行</strong></p>
<p>代码是在线记录的</p>
<pre><code>df1 = pd.DataFrame(
{
'ColumnA': ['id1', 'id2'],
'ColumnB': [['a','b','c'], ['a','d','e']],
}
)
df2 = pd.DataFrame(
{
'ColumnA': ['id3'],
'ColumnB': [['a','b','c','x','y', 'z']],
}
)
# Take cartesian product of both dataframes
df1['k'] = 0
df2['k'] = 0
df = pd.merge(df1, df2, on='k').drop('k',1)
# Check the overlap of the lists and find the overlap length
df['overlap'] = df.apply(lambda x: len(set(x['ColumnB_x']).intersection(
set(x['ColumnB_y']))), axis=1)
# Select whoes overlap length > 2
df = df[df['overlap'] > 2]
print (df)
</code></pre>
<p>输出:</p>
<pre><code> ColumnA_x ColumnB_x ColumnA_y ColumnB_y overlap
0 id1 [a, b, c] id3 [a, b, c, x, y, z] 3
</code></pre>