<p>对集合<code>s_list</code>的列表使用嵌套列表理解。在列表理解中,使用<code>intersection</code>操作检查重叠并获得每个结果的长度。最后,构造数据帧并将其除以<code>df.list_of_value</code>中每个列表的长度</p>
<pre><code>s_list = df.list_of_value.map(set)
overlap = [[len(s1 & s) for s1 in s_list] for s in s_list]
df_final = pd.DataFrame(overlap) / df.list_of_value.str.len().to_numpy()[:,None]
Out[76]:
0 1 2 3
0 1.000000 0.666667 1.000000 1.000000
1 0.666667 1.000000 0.666667 0.666667
2 1.000000 0.666667 1.000000 1.000000
3 1.000000 0.666667 1.000000 1.000000
</code></pre>
<hr/>
<p>如果每个列表中都有重复的值,则应使用<code>collections.Counter</code>而不是<code>set</code>。我将样本数据id=0更改为<code>['a','a','c']</code>,将id=1更改为<code>['d','b','a']</code></p>
<pre><code>sample df:
id list_of_value
0 ['a','a','c'] #changed
1 ['d','b','a'] #changed
2 ['a','b','c']
3 ['a','b','c']
from collections import Counter
c_list = df.list_of_value.map(Counter)
c_overlap = [[sum((c1 & c).values()) for c1 in c_list] for c in c_list]
df_final = pd.DataFrame(c_overlap) / df.list_of_value.str.len().to_numpy()[:,None]
Out[208]:
0 1 2 3
0 1.000000 0.333333 0.666667 0.666667
1 0.333333 1.000000 0.666667 0.666667
2 0.666667 0.666667 1.000000 1.000000
3 0.666667 0.666667 1.000000 1.000000
</code></pre>