擅长:python、mysql、java
<p>使用<code>findall</code>和<code>MultiLabelBinarizer</code>的另一种方法</p>
<pre><code>from sklearn.preprocessing import MultiLabelBinarizer
</code></pre>
<hr/>
<pre><code>mlb = MultiLabelBinarizer()
f = df['Description'].str.findall('|'.join(categories))
out = df.join(pd.DataFrame(mlb.fit_transform(f),columns=mlb.classes_, index=df.index))
</code></pre>
<hr/>
<p>在<code>findall</code>之后<code>series.str.get_dummies</code>的更慢但更简单的版本只有在加入它们之后:</p>
<pre><code>out = df.join(df['Description'].str.findall('|'.join(categories))
.str.join('|').str.get_dummies())
</code></pre>
<hr/>
<pre><code>print(out)
Item Description BLUE PINK RED SHIRT SKIRT YELLOW
0 R2G1 RED, BLUE, SHIRT 1 0 1 1 0 0
1 G23A YELLOW SHIRT 0 0 0 1 0 1
2 P001 BLUE, PINK SKIRT 1 1 0 0 1 0
</code></pre>