从两个表格文件:
文件1.txt
name1 house1
name2 house1
name3 house1
name4 house2
name5 house2
name6 house2
和file2.txt
^{pr2}$我想使用这两个字典中的信息来创建一个像这样的存在/缺席矩阵。在
car motorcycle bike boat skate
house1 1 0 1 0 1
house2 1 1 0 1 0
这是我的代码:
import pandas as pd
with open('file1.txt', 'r') as file1:
col_names = ['name', 'house']
df1 = pd.read_csv(file1, sep='\t', header=None, names=col_names)
with open('file2.txt', 'r') as file2:
col_names = ['name', 'transport']
df2 = pd.read_csv(file2, sep='\t', header=None, names=col_names)
# include the values from df1 into the df2 creating a new column
df2['house'] = df2['name'].map(df1.set_index('name')['house'])
g = df2.groupby('house')['transport'].apply(list).reset_index()
g.join(pd.get_dummies(g['transport'].apply(pd.Series).stack()).sum(level=0)).drop('transport', 1)
print g
这样,我得到了以下输出:
house transport
0 house1 [car, bike, skate]
1 house2 [car, motorcycle, boat]
有一种方法。在
设置
解决方案
^{pr2}$结果
说明
有三个步骤:
df2
,映射自df1
。在pd.get_dummies
将transports
列展开为伪列。在相关问题 更多 >
编程相关推荐