擅长:python、mysql、java
<p>解决方案:</p>
<pre><code># retrieve all the unique elements from your df_b columns (ReqCol_A and ReqCol_B) let it also include nan and other unwanted features
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())
# Taking intersection with df_A column names and fetching the names which need to be targeted
target_features = set(df_A.columns) & features
# Get the Output
df_A.loc[:,target_features]
</code></pre>
<hr/>
<p>性能比较</p>
<p>给定方法:</p>
<pre><code>%%timeit
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())
target_features = set(df_A.columns) & features
df_A.loc[:,target_features]
875 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>第二个答案(使用过滤器):</p>
<pre><code>%%timeit
df_A[df_b.filter(like='ReqCol').replace('', np.nan).stack().tolist()]
2.14 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
</code></pre>
<p>显然,给定的方法比其他方法快得多。你知道吗</p>