2024-07-07 09:03:51 发布
网友
我的数据帧:
Index letters 0 A 1 B 2 D 3 Z
在Python中,我希望获得上面字母列的一个热编码数据帧,其中包含不在该列中的元素,如下所示:
Index A B C D E K Z 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 2 0 0 0 1 0 0 0 3 0 0 0 0 0 0 1
使用merge:
merge
df = pd.DataFrame({'Letters':['A','B', 'D', 'Z']}) all_letters = ['A','B', 'C', 'D','E','K', 'Z'] s = pd.get_dummies(all_letters) s['Letters'] = all_letters df2 = df.merge(s, on='Letters') df2
给予
| | Letters | A | B | C | D | E | K | Z | | -:|: | :| :| :| :| :| :| :| | 0 | A | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 2 | D | 0 | 0 | 0 | 1 | 0 | 0 | 0 | | 3 | Z | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
为此使用get_dummies:
get_dummies
df = pd.get_dummies(df) df.columns = df.columns.str.replace('letters_', '') print(df) Index A B D Z 0 0 1 0 0 0 1 1 0 1 0 0 2 2 0 0 1 0 3 3 0 0 0 1
import pandas as pd df = pd.DataFrame(["A", "A", "C", "C", "E", "F", "G"], columns=['letters']) all_cats = ["A", "B", "C", "D", "E", "F", "G"] ohe = pd.get_dummies(df['letters'], sparse=True).reindex(all_cats, axis=1, fill_value=0) >>> ohe A B C D E F G 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 2 0 0 1 0 0 0 0 3 0 0 1 0 0 0 0 4 0 0 0 0 1 0 0 5 0 0 0 0 0 1 0 6 0 0 0 0 0 0 1
使用
merge
:给予
为此使用
get_dummies
:相关问题 更多 >
编程相关推荐