基于其他具有非空值的列，使用lambda在dataframe中创建列问题的回答

基于其他具有非空值的列，使用lambda在dataframe中创建列

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

为了提高性能，可以使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dot.html" rel="nofollow noreferrer">^{<cd1>}</a>所有不带第一个的列，所有不带最后一个的列，最后一个通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.rstrip.html" rel="nofollow noreferrer">^{<cd4>}</a>删除最后一个<code>|</code>： <pre><code>df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|') print (df) Movie Action Fantasy Vestern new 0 One 1 0 1 Action|Vestern 1 Two 0 0 1 Vestern 2 Three 1 1 0 Action|Fantasy </code></pre> 或者使用列表理解来连接所有没有空字符串的值： <pre><code>arr = df.iloc[:, 1:].values * df.columns[1:].values df['new'] = ['|'.join(y for y in x if y) for x in arr] print (df) Movie Action Fantasy Vestern new 0 One 1 0 1 Action|Vestern 1 Two 0 0 1 Vestern 2 Three 1 1 0 Action|Fantasy </code></pre> 性能： <pre><code>In [54]: %timeit (jez1(df.copy())) 25.2 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [55]: %timeit (jez2(df.copy())) 61.4 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [56]: %timeit (csm(df.copy())) 1.46 s ± 35.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) df = pd.DataFrame({"Movie":['One','Two','Three'], "Action":[1,0,1], "Fantasy":[0,0,1], "Vestern":[1,1,0]}) #print(df) #30k rows df = pd.concat([df] * 10000, ignore_index=True) def csm(df): cols = df.columns.tolist()[1:] df['genres'] = df.apply(lambda x: "|".join(str(z) for z in [i for i in cols if x[i] !=0]) ,axis=1) return df def jez1(df): df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '|').str.rstrip('|') return df def jez2(df): arr = df.iloc[:, 1:].values * df.columns[1:].values df['new'] = ['|'.join(y for y in x if y) for x in arr] return df </code></pre>

基于其他具有非空值的列，使用lambda在dataframe中创建列

1 个回答

相关Python问题