基于其他具有非空值的列，使用lambda在dataframe中创建列

Movie Action Fantasy Vestern genres 0 One 1 0 1 Action|Vestern 1 Two 0 0 1 Vestern 2 Three 1 1 0 Action|Fantasy

import pandas as pd import numpy as np df = pd.DataFrame({"Movie":['One','Two','Three'], "Action":[1,0,1], "Fantasy":[0,0,1], "Vestern":[1,1,0]}) print(df)

2条回答

网友

1楼 · 编辑于 2024-10-02 20:37:36

import pandas as pd
import numpy as np

df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})

cols = df.columns.tolist()[1:]

df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
print(df)

输出

Movie  Action  Fantasy  Vestern          genres
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

网友

2楼 · 编辑于 2024-10-02 20:37:36

为了提高性能，可以使用^{}所有不带第一个的列，所有不带最后一个的列，最后一个通过^{}删除最后一个|：

df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

或者使用列表理解来连接所有没有空字符串的值：

arr = df.iloc[:, 1:].values * df.columns[1:].values
df['new'] = ['|'.join(y for y in x if y) for x in arr]
print (df)
   Movie  Action  Fantasy  Vestern             new
0    One       1        0        1  Action|Vestern
1    Two       0        0        1         Vestern
2  Three       1        1        0  Action|Fantasy

性能：

In [54]: %timeit (jez1(df.copy()))
25.2 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [55]: %timeit (jez2(df.copy()))
61.4 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [56]: %timeit (csm(df.copy()))
1.46 s ± 35.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



df = pd.DataFrame({"Movie":['One','Two','Three'],
                   "Action":[1,0,1],
                   "Fantasy":[0,0,1],
                   "Vestern":[1,1,0]})
#print(df)

#30k rows
df = pd.concat([df] * 10000, ignore_index=True)

def csm(df):
    cols = df.columns.tolist()[1:]
    df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1)
    return df

def jez1(df):
    df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|')
    return df

def jez2(df):
    arr = df.iloc[:, 1:].values * df.columns[1:].values
    df['new'] = ['|'.join(y for y in x if y) for x in arr]
    return df

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于其他具有非空值的列，使用lambda在dataframe中创建列

输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >