<p>我会在分裂之前做:</p>
<p>数据:</p>
<pre><code>In [269]: df
Out[269]:
product_id
transaction_id
1 P01
2 P01,P02
3 P01,P02,P09
4 P01,P03
5 P01,P03,P05
6 P01,P03,P07
7 P01,P03,P08
8 P01,P04
9 P01,P04,P05
10 P01,P04,P08
</code></pre>
<p>解决方案:</p>
^{pr2}$
<p><strong>或者</strong>您可以更改:</p>
<pre><code>testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
</code></pre>
<p>有:</p>
<pre><code>testing_df['product_id'] = testing_df['product_id'].apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))
</code></pre>
<p>演示:</p>
<pre><code>In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))
Out[280]:
transaction_id
1 [P01]
2 [P01, P02]
3 [P09, P01, P02]
4 [P01, P03]
5 [P01, P03, P05]
6 [P07, P01, P03]
7 [P01, P03]
8 [P01]
9 [P01, P05]
10 [P01]
Name: product_id, dtype: object
</code></pre>