<p>这是一个基于<a href="https://stackoverflow.com/users/9840637/anky">anky</a>链接的<a href="https://stackoverflow.com/a/51420716/9840637">fast binarizer method</a>使用sklearn的<a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html" rel="nofollow noreferrer"><strong>^{<cd1>}</strong></a>的<a href="https://stackoverflow.com/a/51420716/9840637">fast binarizer method</a>:</p>
<pre class="lang-py prettyprint-override"><code>from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame({'customer_id':{0:11,1:23,2:25,3:45}, 'department':{0:["'nail'","'men_skincare'"], 1:["'nail'","'fragrance'"], 2:[''], 3:["'skincare'","'men_fragrance'"]}})
mlb = MultiLabelBinarizer()
df = df.join(pd.DataFrame(
mlb.fit_transform(df.department),
columns=[c.strip("'") for c in mlb.classes_],
index=df.index,
)).drop(columns='')
# customer_id department fragrance men_fragrance men_skincare nail skincare
# 0 11 ['nail', 'men_skincare'] 0 0 1 1 0
# 1 23 ['nail', 'fragrance'] 1 0 0 1 0
# 2 25 [] 0 0 0 0 0
# 3 45 ['skincare', 'men_fragrance'] 0 1 0 0 1
</code></pre>
<p><strong>注意:</strong>这假设实际数据的<code>department</code>列包含实际的python列表,而不是类似列表的字符串。如果它们实际上是字符串(即<code>type(df.department[0])</code>输出<code>str</code>),则需要首先进行此转换:</p>
<pre class="lang-py prettyprint-override"><code>df.department = df.department.str.strip('[]').str.split(r'\s*,\s*')
</code></pre>