在更宽的数据帧中转换虚拟对象中的变量列表问题的回答

在更宽的数据帧中转换虚拟对象中的变量列表

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为更好的解决方案是将<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pop.html" rel="nofollow noreferrer">^{<cd1>}</a>与<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html" rel="nofollow noreferrer">^{<cd2>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html" rel="nofollow noreferrer">^{<cd3>}</a>一起使用： <pre><code>df = df.join(df.pop('code').str.join('|').str.get_dummies()) print (df) year gvkey EDUC ENVR HEALTH JUST LAB TAX index 0 1998 15686 0 1 1 0 0 1 1 2005 15372 1 0 1 1 0 1 2 2001 27486 0 0 1 0 1 1 3 2008 84967 0 0 1 1 1 0 </code></pre> 如果性能很重要，请使用<a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html" rel="nofollow noreferrer">^{<cd4>}</a>： <pre><code>from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() df1 = pd.DataFrame(mlb.fit_transform(df.pop('code')),columns=mlb.classes_) df = df.join(df1) print (df) year gvkey EDUC ENVR HEALTH JUST LAB TAX index 0 1998 15686 0 1 1 0 0 1 1 2005 15372 1 0 1 1 0 1 2 2001 27486 0 0 1 0 1 1 3 2008 84967 0 0 1 1 1 0 </code></pre> 您的解决方案是可能的，<a href="https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns/35491399#35491399">but slow</a>，因此最好避免它，同时<code>sum</code>只针对唯一值，因为一般解决方案需要<code>max</code>： <pre><code>df = df.join(pd.get_dummies(df.pop('code').apply(pd.Series).stack()).max(level=0)) print (df) year gvkey EDUC ENVR HEALTH JUST LAB TAX index 0 1998 15686 0 1 1 0 0 1 1 2005 15372 1 0 1 1 0 1 2 2001 27486 0 0 1 0 1 1 3 2008 84967 0 0 1 1 1 0 </code></pre>

在更宽的数据帧中转换虚拟对象中的变量列表

1 个回答

相关Python问题