根据另一个数据帧列中的值删除列

import pandas as pd d = {'foo':[100, 111, 222], 'bar':[333, 444, 555],'foo2':[110, 101, 222], 'bar2':[333, 444, 555],'foo3':[100, 111, 222], 'bar3':[333, 444, 555]} df_A = pd.DataFrame(d) d = {'ReqCol_A':['foo','foo2'], 'bar':[333, 444],'foo2':[100, 111], 'bar2':[333, 444],'ReqCol_B':['bar3', ''], 'bar3':[333, 444]} df_b = pd.DataFrame(d)

2条回答

网友

1楼 · 编辑于 2024-10-02 14:30:35

尝试使用filter仅获取那些带有'ReqCol'的列，然后stack获取列表并过滤db\u数据帧：

df_A[df_b.filter(like='ReqCol').replace('', np.nan).stack().tolist()]

输出：

   foo  bar3  foo2
0  100   333   100
1  111   444   111
2  222   555   222

网友

2楼 · 编辑于 2024-10-02 14:30:35

解决方案：

# retrieve all the unique elements from your df_b columns (ReqCol_A and ReqCol_B) let it also include nan and other unwanted features
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())

# Taking intersection with df_A column names and fetching the names which need to be targeted
target_features = set(df_A.columns) & features

# Get the Output
df_A.loc[:,target_features]

性能比较

给定方法：

%%timeit
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())
target_features = set(df_A.columns) & features
df_A.loc[:,target_features]
875 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

第二个答案（使用过滤器）：

%%timeit 
df_A[df_b.filter(like='ReqCol').replace('', np.nan).stack().tolist()]
2.14 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

显然，给定的方法比其他方法快得多。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章