优化将值转换为0和1的性能

Employee Location1 Location2 Location3 Title1 Title2 Title3 0 1 1 0 0 1 0 0 1 2 0 1 0 1 0 0 2 3 0 0 1 0 1 0 3 4 1 0 0 0 0 1 4 5 1 0 0 0 1 0

import pandas as pd df = pd.DataFrame.from_dict({'Employee': ['1','2','3','4','5'], 'Location': ['Location1', 'Location2','Location3','Location1','Location1'], 'Title': ['Title1','Title1','Title2','Title3','Title2'] }) df_tr = df['Employee'] #temporary employee ids # transposing the data, which takes ages: df_newcols = {} for column in list(df)[1:]: newcols = df[column].unique() for key in newcols: temp_ar = [] for value in df[column]: if key == value: temp_ar.append(1) else: temp_ar.append(0) df_newcols[key] = temp_ar print (df_newcols) # adding transposed to the temp df df_temp = pd.DataFrame.from_dict(df_newcols) # merging with df with employee ids new_df = pd.concat([df_tr,df_temp],axis=1)

3条回答

网友

1楼 · 编辑于 2024-09-25 12:34:25

使用^{}的另一种解决方案：

print( pd.concat([df['Employee'],
                  pd.get_dummies(df['Location']),
                  pd.get_dummies(df['Title'])], axis=1) )

印刷品：

  Employee  Location1  Location2  Location3  Title1  Title2  Title3
0        1          1          0          0       1       0       0
1        2          0          1          0       1       0       0
2        3          0          0          1       0       1       0
3        4          1          0          0       0       0       1
4        5          1          0          0       0       1       0

网友

2楼 · 编辑于 2024-09-25 12:34:25

你应该尝试使用更多的“应用”方法和熊猫的方法。在熊猫中使用“for循环”是非常糟糕的。。。这会毁了你的表演

一种可能的解决方案如下：

import pandas as pd


# read the file
emp=pd.read_csv("employee_huge.txt", sep=" ")


# generate unique lists containing LocationX and TitleX
lnewcols_location=set(emp["Location"].to_list())
lnewcols_title=set(emp["Title"].to_list())


# a function to compare a cell (like "Location1") to a string that is the name of the column
# like "Location2".  If they match return 1, otherwise 0
def same_as_col(acell, col):
    if(acell==col):
        return(1)
    else:
        return(0)


# generate all the LocationN columns with 1 or 0 if there is a match
for i in lnewcols_location:
  emp[i]=emp["Location"].apply(same_as_col, col=i)

# generate all the TitleN columns with 1 or 0 if there is a match
for i in lnewcols_title:
  emp[i]=emp["Title"].apply(same_as_col, col=i)

# removing Location and Title columns
emp=emp.drop(["Location", "Title"], axis=1)

最后，我生成了一个名为employee_hug.txt的文件。其内容的格式如下所示：

Employee Location Title
0 Location4 Title1
1 Location1 Title3
2 Location1 Title2
3 Location1 Title4
4 Location4 Title1

网友

3楼 · 编辑于 2024-09-25 12:34:25

这应该可以做到：

df["_dummy"]=1
df2=pd.concat([
    df.pivot_table(index="Employee", columns="Location", values="_dummy", aggfunc=max), 
    df.pivot_table(index="Employee", columns="Title", values="_dummy", aggfunc=max)
], axis=1).fillna(0).astype(int).reset_index(drop=False)

参考：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

相关问题更多 >

编程相关推荐

热门问题

热门文章