<p>使用<code>stack</code>+<code>pandas.Series.str.get_dummies</code></p>
<pre><code>df.assign(
D=df.stack().str.get_dummies(';').sum(level=0).gt(1).any(1).astype(int)
)
A B C D
0 mom;dad;son; sister;son; yes;no;maybe; 1
1 dad; daughter;niece; no;snow; 0
2 son;dad; cat;son;dad; tree;dad;son; 1
3 daughter;mom; niece; referee; 0
4 dad;daughter; cat; dad; 1
</code></pre>
<hr/>
<h2>详细信息</h2>
<p>请注意,当我们堆叠并获取虚拟对象时,临时结果如下所示:</p>
<pre><code> cat dad daughter maybe mom niece no referee sister snow son tree yes
0 A 0 1 0 0 1 0 0 0 0 0 1 0 0
B 0 0 0 0 0 0 0 0 1 0 1 0 0
C 0 0 0 1 0 0 1 0 0 0 0 0 1
1 A 0 1 0 0 0 0 0 0 0 0 0 0 0
B 0 0 1 0 0 1 0 0 0 0 0 0 0
C 0 0 0 0 0 0 1 0 0 1 0 0 0
2 A 0 1 0 0 0 0 0 0 0 0 1 0 0
B 1 1 0 0 0 0 0 0 0 0 1 0 0
C 0 1 0 0 0 0 0 0 0 0 1 1 0
3 A 0 0 1 0 1 0 0 0 0 0 0 0 0
B 0 0 0 0 0 1 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 1 0 0 0 0 0
4 A 0 1 1 0 0 0 0 0 0 0 0 0 0
B 1 0 0 0 0 0 0 0 0 0 0 0 0
C 0 1 0 0 0 0 0 0 0 0 0 0 0
</code></pre>
<p>前面的列嵌入到索引的第二级。所以我想在第一个层次上求和,看看这个词出现了多少次。你知道吗</p>
<p>这个总和看起来像:</p>
<pre><code> cat dad daughter maybe mom niece no referee sister snow son tree yes
0 0 1 0 1 1 0 1 0 1 0 2 0 1
1 0 1 1 0 0 1 1 0 0 1 0 0 0
2 1 3 0 0 0 0 0 0 0 0 3 1 0
3 0 0 1 0 1 1 0 1 0 0 0 0 0
4 1 2 1 0 0 0 0 0 0 0 0 0 0
</code></pre>
<p>注意,我们在第1行捕获<code>'son'</code>,在第3行捕获<code>'dad'</code>和<code>'son'</code>,依此类推。你知道吗</p>
<p>如果它出现在多个列中(因此<code>gt(1)</code>),那么我想将它计为<code>1</code>(因此<code>any(1).astype(int)</code>)。你知道吗</p>