<p>我认为更好的解决方案是将<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pop.html" rel="nofollow noreferrer">^{<cd1>}</a>与<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html" rel="nofollow noreferrer">^{<cd2>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html" rel="nofollow noreferrer">^{<cd3>}</a>一起使用:</p>
<pre><code>df = df.join(df.pop('code').str.join('|').str.get_dummies())
print (df)
year gvkey EDUC ENVR HEALTH JUST LAB TAX
index
0 1998 15686 0 1 1 0 0 1
1 2005 15372 1 0 1 1 0 1
2 2001 27486 0 0 1 0 1 1
3 2008 84967 0 0 1 1 1 0
</code></pre>
<p>如果性能很重要,请使用<a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html" rel="nofollow noreferrer">^{<cd4>}</a>:</p>
<pre><code>from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df1 = pd.DataFrame(mlb.fit_transform(df.pop('code')),columns=mlb.classes_)
df = df.join(df1)
print (df)
year gvkey EDUC ENVR HEALTH JUST LAB TAX
index
0 1998 15686 0 1 1 0 0 1
1 2005 15372 1 0 1 1 0 1
2 2001 27486 0 0 1 0 1 1
3 2008 84967 0 0 1 1 1 0
</code></pre>
<p>您的解决方案是可能的,<a href="https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns/35491399#35491399">but slow</a>,因此最好避免它,同时<code>sum</code>只针对唯一值,因为一般解决方案需要<code>max</code>:</p>
<pre><code>df = df.join(pd.get_dummies(df.pop('code').apply(pd.Series).stack()).max(level=0))
print (df)
year gvkey EDUC ENVR HEALTH JUST LAB TAX
index
0 1998 15686 0 1 1 0 0 1
1 2005 15372 1 0 1 1 0 1
2 2001 27486 0 0 1 0 1 1
3 2008 84967 0 0 1 1 1 0
</code></pre>