<p>如果您使用的是熊猫1.2.0或更新版本(于2020年12月26日发布),笛卡尔积(十字接头)可以简化如下:</p>
<pre><code> df = df1.merge(df2, how='cross') # simplified cross joint for pandas >= 1.2.0
</code></pre>
<p>另外,<strong>如果系统性能(执行时间)是您关心的问题,建议使用<code>list(map... </code>而不是较慢的<code>apply(... axis=1)</code></p>
<p>使用<code>apply(... axis=1)</code>:</p>
<pre><code>%%timeit
df['overlap'] = df.apply(lambda x:
len(set(x['ColumnB1']).intersection(
set(x['ColumnB2']))), axis=1)
800 µs ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>使用<code>list(map(...</code>时:</p>
<pre><code>%%timeit
df['overlap'] = list(map(lambda x, y: len(set(x).intersection(set(y))), df['ColumnB1'], df['ColumnB2']))
217 µs ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>请注意,<strong>使用<code>list(map...</code>的速度快了3倍</强></p>
<p>整套代码供您参考:</p>
<pre><code> data = {'ColumnA1': ['id1', 'id2'], 'ColumnB1': [['a', 'b', 'c'], ['a', 'd', 'e']]}
df1 = pd.DataFrame(data)
data = {'ColumnA2': ['id3', 'id4'], 'ColumnB2': [['a','b','c','x','y', 'z'], ['d','e','f','p','q', 'r']]}
df2 = pd.DataFrame(data)
df = df1.merge(df2, how='cross') # for pandas version >= 1.2.0
df['overlap'] = list(map(lambda x, y: len(set(x).intersection(set(y))), df['ColumnB1'], df['ColumnB2']))
df = df[df['overlap'] >= 2]
print (df)
</code></pre>