<p>以下是一些示例数据:</p>
<pre><code>In [1]:
df = pd.DataFrame(np.transpose([np.random.choice(['ebt','other'], (10)),
np.random.rand(10)]), columns=['paymenttypeid','other'])
df
Out[1]:
paymenttypeid other
0 other 0.3130770966143612
1 other 0.5167434068096931
2 ebt 0.7606898392115471
3 ebt 0.9424572692382547
4 ebt 0.624282017575857
5 ebt 0.8584841824784487
6 other 0.5017083765654611
7 other 0.025994123211164233
8 ebt 0.07045354449612984
9 ebt 0.11976351556850084
</code></pre>
<p>让我们把它转换成一个数据帧</p>
^{pr2}$
<p>并使用<code>apply</code>(在序列上)来分配:</p>
<pre><code>In [3]:
data['ebt_dummy'] = data.paymenttypeid.apply(lambda x: 1 if x =='ebt' else 0, meta=('paymenttypeid', 'str'))
data.compute()
Out [3]:
paymenttypeid other ebt_dummy
0 other 0.3130770966143612 0
1 other 0.5167434068096931 0
2 ebt 0.7606898392115471 1
3 ebt 0.9424572692382547 1
4 ebt 0.624282017575857 1
5 ebt 0.8584841824784487 1
6 other 0.5017083765654611 0
7 other 0.025994123211164233 0
8 ebt 0.07045354449612984 1
9 ebt 0.11976351556850084 1
</code></pre>
<hr/>
<p><strong>更新:</strong></p>
<p>您传递的<code>meta</code>似乎是问题所在,因为这是有效的:</p>
<pre><code>data = data.map_partitions(lambda df: df.assign(
ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0)))
data.compute()
</code></pre>
<p>在我的示例中,如果我想指定<code>meta</code>,那么我必须传递当前<code>data</code>的数据类型,而不是我分配后预期的类型:</p>
<pre><code>data.map_partitions(lambda df: df.assign(
ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0)),
meta={'paymenttypeid': 'str', 'other': 'float64'})
</code></pre>